Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010116886 | QH324.2 K67 2003 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Sequence similarity is a powerful tool for discovering biological function. Just as the ancient Greeks used comparative anatomy to understand the human body and linguists used the Rosetta stone to decipher Egyptian hieroglyphs, today we can use comparative sequence analysis to understand genomes. BLAST (Basic Local Alignment Search Tool), is a sophisticated software package for rapid searching of nucleotide and protein databases. It is one of the most important software packages used in sequence analysis and bioinformatics. Most users of BLAST, however, seldom move beyond the program's default parameters, and never take advantage of its full power. BLAST is the only book completely devoted to this popular suite of tools. It offers biologists, computational biology students, and bioinformatics professionals a clear understanding of BLAST as well as the science it supports. This book shows you how to move beyond the default parameters, get specific answers using BLAST, and how to interpret your results. The book also contains tutorial and reference sections covering NCBI-BLAST and WU-BLAST, background material to help you understand the statistics behind BLAST, Perl scripts to help you prepare your data and analyze your results, and a wealth of tips and tricks for configuring BLAST to meet your own research needs. Some of the topics covered include:
BLAST basics and the NCBI web interface How to select appropriate search parameters BLAST programs: BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX, PHI-BLAST, and PSI BLAST Detailed BLAST references, including NCBI-BLAST and WU-BLAST Understanding biological sequences Sequence similarity, homology, scoring matrices, scores, and evolution Sequence Alignment Calculating BLAST statistics Industrial-strength BLAST, including developing applications with Perl and BLAST BLAST is the only comprehensive reference with detailed, accurate information on optimizing BLAST searches for high-throughput sequence analysis. This is a book that any biologist should own.Author Notes
Ian Korf received his B.A. from Cornell University and his Ph.D from Indiana University. His formal training is in molecular biology but he has had a fondness for computer programming since his early teens. His post-doctoral research at Washington University in St. Louis and at The Wellcome Trust Sanger Institute in the U.K. has focused on genomic sequence analysis with an emphasis on comparative genomics and gene prediction. His goal in life is to follow genomes, wherever they happen to take him.
Mark Yandell received his PhD in Molecular, Cellular and Developmental Biology from the University of Colorado, Boulder. After graduation, he joined the Genome Sequencing Center at Washington University, where he pursued post-doctoral studies in computational biology, genome annotation and SNP discovery. In 1999 he joined Celera Genomics, where he wrote much of the software used by Celera to annotate and analyze the drosophila, human, mouse and mosquito genomes. He recently joined the Berkeley Drosophila Genome Project.
Joseph Bedell received his B.S. in Genetics from the University of Georgia in 1991 then worked on mosquito genetics at the Centers for Disease Control and Prevention in Atlanta. He went on to complete a Ph.D. in human genetics at the University of California, Irvine in 1999. Joseph, like his co-authors, completed a post-doc in mammalian gene annotation with Warren Gish, one of the original developers of BLAST. He is currently the Director of Bioinformatics for Orion Genomics in St. Louis where he spends his days (and nights) using BLAST to answer important biological and phylogenetic questions in plants.
Table of Contents
Foreword | p. xi |
Preface | p. xiii |
Part I. Introduction | |
1. Hello BLAST | p. 3 |
What Is BLAST? | p. 3 |
Using NCBI-BLAST | p. 4 |
Alternate Output Formats | p. 12 |
Alternate Alignment Views | p. 13 |
The Next Step | p. 14 |
Further Reading | p. 15 |
Part II. Theory | |
2. Biological Sequences | p. 19 |
The Central Dogma of Molecular Biology | p. 19 |
Evolution | p. 27 |
Genomes and Genes | p. 35 |
Biological Sequences and Similarity | p. 38 |
Further Reading | p. 39 |
3. Sequence Alignment | p. 40 |
Global Alignment: Needleman-Wunsch | p. 40 |
Local Alignment: Smith-Waterman | p. 46 |
Dynamic Programming | p. 50 |
Algorithmic Complexity | p. 50 |
Global Versus Local | p. 50 |
Variations | p. 51 |
Final Thoughts | p. 53 |
Further Reading | p. 53 |
4. Sequence Similarity | p. 55 |
Introduction to Information Theory | p. 55 |
Amino Acid Similarity | p. 57 |
Scoring Matrices | p. 59 |
Target Frequencies, lambda, and H | p. 60 |
Sequence Similarity | p. 64 |
Karlin-Altschul Statistics | p. 65 |
Sum Statistics and Sum Scores | p. 67 |
Further Reading | p. 70 |
Part III. Practice | |
5. BLAST | p. 75 |
The Five BLAST Programs | p. 75 |
The BLAST Algorithm | p. 76 |
Further Reading | p. 87 |
6. Anatomy of a BLAST Report | p. 88 |
Basic Structure | p. 88 |
Alignments | p. 90 |
7. A BLAST Statistics Tutorial | p. 96 |
Basic BLAST Statistics | p. 96 |
Using Statistics to Understand BLAST Results | p. 109 |
Where Did My Oligo Go? | p. 109 |
8. 20 Tips to Improve Your BLAST Searches | p. 116 |
8.1 Don't Use the Default Parameters | p. 116 |
8.2 Treat BLAST Searches as Scientific Experiments | p. 116 |
8.3 Perform Controls, Especially in the Twilight Zone | p. 117 |
8.4 View BLAST Reports Graphically | p. 118 |
8.5 Use the Karlin-Altschul Equation to Design Experiments | p. 119 |
8.6 When Troubleshooting, Read the Footer First | p. 119 |
8.7 Know When to Use Complexity Filters | p. 120 |
8.8 Mask Repeats in Genomic DNA | p. 121 |
8.9 Segment Large Genomic Sequences | p. 121 |
8.10 Be Skeptical of Hypothetical Proteins | p. 123 |
8.11 Expect Contaminants in EST Databases | p. 123 |
8.12 Use Caution When Searching Raw Sequencing Reads | p. 124 |
8.13 Look for Stop Codons and Frame-Shifts to find Pseudo-Genes | p. 124 |
8.14 Consider Using Ungapped Alignment for BLASTX, TBLASTN, and TBLASTX | p. 124 |
8.15 Look for Gaps in Coverage as a Sign of Missed Exons | p. 126 |
8.16 Parse BLAST Reports with Bioperl | p. 126 |
8.17 Perform Pilot Experiments | p. 128 |
8.18 Examine Statistical Outliers | p. 128 |
8.19 Use links and topcomboN to Make Sense of Alignment Groups | p. 128 |
8.20 How to Lie with BLAST Statistics | p. 128 |
9. BLAST Protocols | p. 130 |
BLASTN Protocols | p. 131 |
BLASTP Protocols | p. 144 |
BLASTX Protocols | p. 147 |
TBLASTN Protocols | p. 152 |
TBLASTX Protocols | p. 155 |
Part IV. Industrial-Strength BLAST | |
10. Installation and Command-Line Tutorial | p. 161 |
NCBI-BLAST Installation | p. 161 |
WU-BLAST Installation | p. 166 |
Command-Line Tutorial | p. 170 |
Editing Scoring Matrices | p. 186 |
11. BLAST Databases | p. 188 |
FASTA Files | p. 188 |
BLAST Databases | p. 193 |
Sequence Databases | p. 198 |
Sequence Database Management Strategies | p. 206 |
12. Hardware and Software Optimizations | p. 213 |
The Persistence of Memory | p. 213 |
CPUs and Computer Architecture | p. 215 |
Compute Clusters | p. 216 |
Distributed Resource Management | p. 218 |
Software Tricks | p. 220 |
Optimized NCBI-BLAST | p. 224 |
Part V. BLAST Reference | |
13. NCBI-BLAST Reference | p. 229 |
Usage Statements | p. 229 |
Command-Line Syntax | p. 229 |
blastall Parameters | p. 230 |
formatdb Parameters | p. 240 |
fastacmd Parameters | p. 242 |
megablast Parameters | p. 245 |
bl2seq Parameters | p. 252 |
blastpgp Parameters (PSI-BLAST and PHI-BLAST) | p. 256 |
blastclust Parameters | p. 264 |
14. WU-BLAST Reference | p. 267 |
Usage Statements | p. 268 |
Command-Line Syntax | p. 268 |
WU-BLAST Parameters | p. 269 |
xdformat Parameters | p. 281 |
xdget Parameters | p. 285 |
Part VI. Appendixes | |
A. NCBI Display Formats | p. 291 |
B. Nucleotide Scoring Schemes | p. 299 |
C. NCBI-BLAST Scoring Schemes | p. 302 |
D. blast-imager.pl | p. 305 |
E. blast2table.pl | p. 309 |
Glossary | p. 313 |
Index | p. 319 |