Cover image for BLAST
Title:
BLAST
Personal Author:
Edition:
1st ed.
Publication Information:
Sebastopol, CA : O'Reilly & Associates, 2003
ISBN:
9780596002992
General Note:
An essential guide to the Basic Local Alignment Search Tool

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010116886 QH324.2 K67 2003 Open Access Book Book
Searching...

On Order

Summary

Summary

Sequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we can use comparative sequence analysis to understand genomes. BLAST (Basic Local Alignment Search Tool), is a sophisticated software package for rapid searching of nucleotide and protein databases. It is one of the most important software packages used in sequence analysis and bioinformatics. Most users of BLAST, however, seldom move beyond the program's default parameters, and never take advantage of its full power. BLAST is the only book completely devoted to this popular suite of tools. It offers biologists, computational biology students, and bioinformatics professionals a clear understanding of BLAST as well as the science it supports. This book shows you how to move beyond the default parameters, get specific answers using BLAST, and how to interpret your results. The book also contains tutorial and reference sections covering NCBI-BLAST and WU-BLAST, background material to help you understand the statistics behind BLAST, Perl scripts to help you prepare your data and analyze your results, and a wealth of tips and tricks for configuring BLAST to meet your own research needs. Some of the topics covered include:

BLAST basics and the NCBI web interface How to select appropriate search parameters BLAST programs: BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX, PHI-BLAST, and PSI BLAST Detailed BLAST references, including NCBI-BLAST and WU-BLAST Understanding biological sequences Sequence similarity, homology, scoring matrices, scores, and evolution Sequence Alignment Calculating BLAST statistics Industrial-strength BLAST, including developing applications with Perl and BLAST BLAST is the only comprehensive reference with detailed, accurate information on optimizing BLAST searches for high-throughput sequence analysis. This is a book that any biologist should own.


Author Notes

Ian Korf received his B.A. from Cornell University and his Ph.D from Indiana University. His formal training is in molecular biology but he has had a fondness for computer programming since his early teens. His post-doctoral research at Washington University in St. Louis and at The Wellcome Trust Sanger Institute in the U.K. has focused on genomic sequence analysis with an emphasis on comparative genomics and gene prediction. His goal in life is to follow genomes, wherever they happen to take him.

Mark Yandell received his PhD in Molecular, Cellular and Developmental Biology from the University of Colorado, Boulder. After graduation, he joined the Genome Sequencing Center at Washington University, where he pursued post-doctoral studies in computational biology, genome annotation and SNP discovery. In 1999 he joined Celera Genomics, where he wrote much of the software used by Celera to annotate and analyze the drosophila, human, mouse and mosquito genomes. He recently joined the Berkeley Drosophila Genome Project.

Joseph Bedell received his B.S. in Genetics from the University of Georgia in 1991 then worked on mosquito genetics at the Centers for Disease Control and Prevention in Atlanta. He went on to complete a Ph.D. in human genetics at the University of California, Irvine in 1999. Joseph, like his co-authors, completed a post-doc in mammalian gene annotation with Warren Gish, one of the original developers of BLAST. He is currently the Director of Bioinformatics for Orion Genomics in St. Louis where he spends his days (and nights) using BLAST to answer important biological and phylogenetic questions in plants.


Table of Contents

Forewordp. xi
Prefacep. xiii
Part I. Introduction
1. Hello BLASTp. 3
What Is BLAST?p. 3
Using NCBI-BLASTp. 4
Alternate Output Formatsp. 12
Alternate Alignment Viewsp. 13
The Next Stepp. 14
Further Readingp. 15
Part II. Theory
2. Biological Sequencesp. 19
The Central Dogma of Molecular Biologyp. 19
Evolutionp. 27
Genomes and Genesp. 35
Biological Sequences and Similarityp. 38
Further Readingp. 39
3. Sequence Alignmentp. 40
Global Alignment: Needleman-Wunschp. 40
Local Alignment: Smith-Watermanp. 46
Dynamic Programmingp. 50
Algorithmic Complexityp. 50
Global Versus Localp. 50
Variationsp. 51
Final Thoughtsp. 53
Further Readingp. 53
4. Sequence Similarityp. 55
Introduction to Information Theoryp. 55
Amino Acid Similarityp. 57
Scoring Matricesp. 59
Target Frequencies, lambda, and Hp. 60
Sequence Similarityp. 64
Karlin-Altschul Statisticsp. 65
Sum Statistics and Sum Scoresp. 67
Further Readingp. 70
Part III. Practice
5. BLASTp. 75
The Five BLAST Programsp. 75
The BLAST Algorithmp. 76
Further Readingp. 87
6. Anatomy of a BLAST Reportp. 88
Basic Structurep. 88
Alignmentsp. 90
7. A BLAST Statistics Tutorialp. 96
Basic BLAST Statisticsp. 96
Using Statistics to Understand BLAST Resultsp. 109
Where Did My Oligo Go?p. 109
8. 20 Tips to Improve Your BLAST Searchesp. 116
8.1 Don't Use the Default Parametersp. 116
8.2 Treat BLAST Searches as Scientific Experimentsp. 116
8.3 Perform Controls, Especially in the Twilight Zonep. 117
8.4 View BLAST Reports Graphicallyp. 118
8.5 Use the Karlin-Altschul Equation to Design Experimentsp. 119
8.6 When Troubleshooting, Read the Footer Firstp. 119
8.7 Know When to Use Complexity Filtersp. 120
8.8 Mask Repeats in Genomic DNAp. 121
8.9 Segment Large Genomic Sequencesp. 121
8.10 Be Skeptical of Hypothetical Proteinsp. 123
8.11 Expect Contaminants in EST Databasesp. 123
8.12 Use Caution When Searching Raw Sequencing Readsp. 124
8.13 Look for Stop Codons and Frame-Shifts to find Pseudo-Genesp. 124
8.14 Consider Using Ungapped Alignment for BLASTX, TBLASTN, and TBLASTXp. 124
8.15 Look for Gaps in Coverage as a Sign of Missed Exonsp. 126
8.16 Parse BLAST Reports with Bioperlp. 126
8.17 Perform Pilot Experimentsp. 128
8.18 Examine Statistical Outliersp. 128
8.19 Use links and topcomboN to Make Sense of Alignment Groupsp. 128
8.20 How to Lie with BLAST Statisticsp. 128
9. BLAST Protocolsp. 130
BLASTN Protocolsp. 131
BLASTP Protocolsp. 144
BLASTX Protocolsp. 147
TBLASTN Protocolsp. 152
TBLASTX Protocolsp. 155
Part IV. Industrial-Strength BLAST
10. Installation and Command-Line Tutorialp. 161
NCBI-BLAST Installationp. 161
WU-BLAST Installationp. 166
Command-Line Tutorialp. 170
Editing Scoring Matricesp. 186
11. BLAST Databasesp. 188
FASTA Filesp. 188
BLAST Databasesp. 193
Sequence Databasesp. 198
Sequence Database Management Strategiesp. 206
12. Hardware and Software Optimizationsp. 213
The Persistence of Memoryp. 213
CPUs and Computer Architecturep. 215
Compute Clustersp. 216
Distributed Resource Managementp. 218
Software Tricksp. 220
Optimized NCBI-BLASTp. 224
Part V. BLAST Reference
13. NCBI-BLAST Referencep. 229
Usage Statementsp. 229
Command-Line Syntaxp. 229
blastall Parametersp. 230
formatdb Parametersp. 240
fastacmd Parametersp. 242
megablast Parametersp. 245
bl2seq Parametersp. 252
blastpgp Parameters (PSI-BLAST and PHI-BLAST)p. 256
blastclust Parametersp. 264
14. WU-BLAST Referencep. 267
Usage Statementsp. 268
Command-Line Syntaxp. 268
WU-BLAST Parametersp. 269
xdformat Parametersp. 281
xdget Parametersp. 285
Part VI. Appendixes
A. NCBI Display Formatsp. 291
B. Nucleotide Scoring Schemesp. 299
C. NCBI-BLAST Scoring Schemesp. 302
D. blast-imager.plp. 305
E. blast2table.plp. 309
Glossaryp. 313
Indexp. 319