Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010210385 | QH324.2 B87 2009 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
The Beauty of Protein Structures and the Mathematics behind Structural Bioinformatics
Providing the framework for a one-semester undergraduate course, Structural Bioinformatics: An Algorithmic Approach shows how to apply key algorithms to solve problems related to macromolecular structure.
Helps Students Go Further in Their Study of Structural Biology
Following some introductory material in the first few chapters, the text solves the longest common subsequence problem using dynamic programming and explains the science models for the Nussinov and MFOLD algorithms. It then reviews sequence alignment, along with the basic mathematical calculations needed for measuring the geometric properties of macromolecules. After looking at how coordinate transformations facilitate the translation and rotation of molecules in a 3D space, the author introduces structural comparison techniques, superposition algorithms, and algorithms that compare relationships within a protein. The final chapter explores how regression and classification are becoming more useful in protein analysis and drug design.
At the Crossroads of Biology, Mathematics, and Computer Science
Connecting biology, mathematics, and computer science, this practical text presents various bioinformatics topics and problems within a scientific methodology that emphasizes nature (the source of empirical observations), science (the mathematical modeling of the natural process), and computation (the science of calculating predictions and mathematical objects based on mathematical models).
Table of Contents
Preface | p. xix |
Acknowledgments | p. xxiii |
Chapter 1 The Study of Structural Bioinformatics | p. 1 |
1.1 Motivation | p. 1 |
1.2 Small Beginnings | p. 4 |
1.3 Structural Bioinformatics and the Scientific Method | p. 5 |
1.3.1 Three Realms: Nature, Science, and Computation | p. 6 |
1.3.2 Hypothesis, Model, and Theory | p. 8 |
1.3.3 Laws, Postulates, and Assumptions | p. 12 |
1.3.4 Model Theory and Computational Theory | p. 13 |
1.3.5 Different Assumptions for Different Models | p. 14 |
1.4 A More Detailed Problem Analysis: Force Fields | p. 15 |
1.4.1 Nature | p. 16 |
1.4.2 Science | p. 16 |
1.4.2.1 Energy Terms for Bonded Atoms | p. 16 |
1.4.2.2 Energy Terms for Nonbonded Atoms | p. 19 |
1.4.2.3 Total Potential Energy | p. 21 |
1.4.3 Computation | p. 21 |
1.5 Modeling Issues | p. 25 |
1.5.1 Rashomon | p. 26 |
1.5.2 Ockham | p. 26 |
1.5.3 Bellman | p. 27 |
1.5.4 Interpretability | p. 28 |
1.5.5 Refutability | p. 29 |
1.5.6 Complexity and Approximation | p. 29 |
1.6 Sources of Error | p. 32 |
1.7 Summary | p. 33 |
1.8 Exercises | p. 34 |
References | p. 36 |
Chapter 2 Introduction to Macromolecular Structure | p. 37 |
2.1 Motivation | p. 37 |
2.2 Overview of Protein Structure | p. 38 |
2.2.1 Amino Acids and Primary Sequence | p. 38 |
2.2.2 Secondary Structure | p. 44 |
2.2.2.1 Alpha Helices | p. 44 |
2.2.2.2 Beta Strands | p. 47 |
2.2.2.3 Loops | p. 52 |
2.2.3 Tertiary Structure | p. 53 |
2.2.3.1 What Is Tertiary Structure? | p. 54 |
2.2.3.2 The Tertiary Structure of Myoglobin | p. 54 |
2.2.3.3 Tertiary Structure Beyond the Binding Pocket | p. 58 |
2.2.4 Quaternary Structure | p. 64 |
2.2.5 Protein Functionality | p. 67 |
2.2.6 Protein Domains | p. 68 |
2.3 An Overview of Rna Structure | p. 70 |
2.3.1 Nucleotides and RNA Primary Sequence | p. 71 |
2.3.2 RNA Secondary Structure | p. 72 |
2.3.3 RNA Tertiary Structure | p. 75 |
2.4 Exercises | p. 78 |
References | p. 80 |
Chapter 3 Data Sources, Formats, and Applications | p. 83 |
3.1 Motivation | p. 83 |
3.2 Sources of Structural Data | p. 84 |
3.2.1 PDB: The Protein Data Bank | p. 84 |
3.2.2 PDBsum: The PDB Summary | p. 86 |
3.2.3 SCOP: Structural Classification of Proteins | p. 86 |
3.2.4 CATH: The CATH Hierarchy | p. 88 |
3.2.5 PubChem | p. 92 |
3.2.6 DrugBank | p. 94 |
3.3 PDB File Format | p. 95 |
3.4 Visualization of Molecular Data | p. 98 |
3.4.1 Plug-In versus Stand-Alone | p. 99 |
3.4.2 Change of Viewing Perspective | p. 99 |
3.4.3 Graphical Representation | p. 99 |
3.4.4 Visual Effects | p. 101 |
3.4.5 Selection Abilities | p. 101 |
3.4.6 Computational Tools | p. 102 |
3.4.7 Extras | p. 102 |
3.5 Software for Structural Bioinformatics | p. 103 |
3.5.1 PyMOL | p. 103 |
3.5.2 Eclipse | p. 103 |
3.5.3 MarvinSketch | p. 104 |
3.5.4 ACD/ChemSketch | p. 104 |
3.5.5 JOELib2 | p. 105 |
3.5.6 Chemistry Development Kit (CDK) | p. 105 |
3.5.7 BioPython | p. 105 |
3.6 Exercises | p. 106 |
References | p. 109 |
Chapter 4 Dynamic Programming | p. 111 |
4.1 Motivation | p. 111 |
4.2 Introduction | p. 112 |
4.3 A DP Example: The Al Gore Rhythm For Giving Talks | p. 112 |
4.3.1 Problem Statement | p. 112 |
4.3.2 Terminology: Configurations and Scores | p. 113 |
4.3.3 Analysis of Our Given Problem | p. 113 |
4.4 A Recipe for Dynamic Programming | p. 116 |
4.5 Longest Common Subsequence | p. 116 |
4.5.1 Problem Statement | p. 117 |
4.5.2 Prefixes | p. 118 |
4.5.3 Relations Among Subproblems | p. 118 |
4.5.4 A Recurrence for the LCS | p. 119 |
4.6 Exercises | p. 123 |
Chapter 5 RNA Secondary Structure Prediction | p. 125 |
5.1 Motivation | p. 126 |
5.2 Introduction to the Problem | p. 128 |
5.2.1 Nature | p. 129 |
5.2.1.1 Where Do Hydrogen Bonds Form? | p. 129 |
5.2.1.2 Thermodynamic Issues | p. 130 |
5.2.1.3 Consensus Sequence Patterns | p. 132 |
5.2.1.4 Complications | p. 133 |
5.2.2 Science | p. 133 |
5.2.2.1 Modeling Secondary Structure | p. 133 |
5.2.2.2 Single Base Pairs | p. 134 |
5.2.2.3 Stacking Energy Models | p. 134 |
5.2.3 Computation | p. 138 |
5.2.3.1 Display of Secondary Structure | p. 139 |
5.2.4 Restating the Problem | p. 145 |
5.3 The Nussinov Dynamic Programming Algorithm | p. 146 |
5.3.1 Execution Time | p. 155 |
5.4 The Mfold Algorithm: Terminology | p. 155 |
5.4.1 The MFOLD Algorithm: Recursion | p. 160 |
5.4.2 MFOLD Extensions | p. 162 |
5.4.3 MFOLD Execution Time | p. 162 |
5.5 Exercises | p. 163 |
References | p. 164 |
Chapter 6 Protein Sequence Alignment | p. 167 |
6.1 Protein Homology | p. 167 |
6.1.1 Nature | p. 168 |
6.1.2 Science | p. 170 |
6.1.2.1 Partial Matches | p. 172 |
6.1.2.2 Building a BLOSUM Matrix | p. 173 |
6.1.2.3 Gaps | p. 179 |
6.1.2.4 Summary | p. 180 |
6.1.3 Computation | p. 180 |
6.1.3.1 Subproblem Specification | p. 181 |
6.1.3.2 Scoring Alignments | p. 181 |
6.1.3.3 Suitability of the Subproblem | p. 182 |
6.1.3.4 A Global Alignment Example | p. 186 |
6.2 Variations in the Global Alignment Algorithm | p. 186 |
6.3 The Significance of a Global Alignment | p. 187 |
6.3.1 Computer-Assisted Comparison | p. 188 |
6.3.2 Percentage Identity Comparison | p. 189 |
6.4 Local Alignment | p. 190 |
6.5 Exercises | p. 193 |
References | p. 195 |
Chapter 7 Protein Geometry | p. 197 |
7.1 Motivation | p. 197 |
7.2 Introduction | p. 198 |
7.3 Calculations Related to Protein Geometry | p. 198 |
7.3.1 Interatomic Distance | p. 198 |
7.3.2 Bond Angle | p. 198 |
7.3.3 Dihedral Angles | p. 199 |
7.3.3.1 Defining Dihedral Angles | p. 199 |
7.3.3.2 Computation of a Normal | p. 201 |
7.3.3.3 Calculating the Phi Dihedral Angle | p. 204 |
7.3.3.4 Sign of the Dihedral Angle | p. 204 |
7.3.3.5 Calculating the Psi Dihedral Angle | p. 206 |
7.4 Ramachandran Plots | p. 206 |
7.5 Inertial Axes | p. 212 |
7.6 Exercises | p. 216 |
References | p. 220 |
Chapter 8 Coordinate Transformations | p. 223 |
8.1 Motivation | p. 223 |
8.2 Introduction | p. 224 |
8.3 Translation Transformations | p. 224 |
8.3.1 Translation to Find Centroid at the Origin | p. 224 |
8.4 Rotation Transformations | p. 225 |
8.4.1 Rotation Transformations in the Plane | p. 226 |
8.4.2 Rotations in 3-D Space | p. 227 |
8.5 Isometric Transformations | p. 231 |
8.5.1 Our Setting Is a Euclidean Vector Space | p. 232 |
8.5.2 Orthogonality of A Implies Isometry of T | p. 232 |
8.5.3 Isometry of T Implies Orthogonality of A | p. 233 |
8.5.4 Preservation of Angles | p. 234 |
8.5.5 More Isometries | p. 234 |
8.5.6 Back to Rotations in the Plane | p. 235 |
8.5.7 Rotations in the 3-D Space: A Summary | p. 238 |
8.6 Exercises | p. 238 |
References | p. 239 |
Chapter 9 Structure Comparison, Alignment, and Superposition | p. 241 |
9.1 Motivation | p. 242 |
9.2 Introduction | p. 245 |
9.2.1 Specifying the Problem | p. 245 |
9.3 Techniques for Structural Comparison | p. 246 |
9.4 Scoring Similarities and Optimizing Scores | p. 247 |
9.5 Superposition Algorithms | p. 247 |
9.5.1 Overview | p. 247 |
9.5.2 Characterizing the Superposition Algorithm | p. 249 |
9.5.3 Formal Problem Description | p. 249 |
9.5.4 Computations to Achieve Maximal Overlap | p. 251 |
9.5.5 Summary | p. 257 |
9.5.6 Measuring Overlap | p. 259 |
9.5.6.1 Calculation of the Root Mean Square Deviation (RMSD) | p. 259 |
9.5.6.2 RMSD Issues | p. 259 |
9.5.7 Dealing with Weaker Sequence Similarity | p. 260 |
9.5.8 Strategies Based on a Distance Matrix | p. 261 |
9.6 Algorithms Comparing Relationships within Proteins | p. 263 |
9.6.1 Dali | p. 263 |
9.6.2 SSAP | p. 267 |
9.6.2.1 Motivation | p. 267 |
9.6.2.2 Introduction to SSAP | p. 269 |
9.6.2.3 Overview of SSAP | p. 271 |
9.6.2.4 Calculating the Views | p. 272 |
9.6.2.5 Building the Consensus Matrix | p. 272 |
9.6.2.6 Compute the Optimal Path in the Consensus Matrix | p. 278 |
9.7 Exercises | p. 279 |
References | p. 282 |
Chapter 10 Machine Learning | p. 285 |
10.1 Motivation | p. 285 |
10.2 Issues of Complexity | p. 287 |
10.2.1 Computational Scalability | p. 287 |
10.2.2 Intrinsic Complexity | p. 287 |
10.2.3 Inadequate Knowledge | p. 288 |
10.3 Prediction Via Machine Learning | p. 289 |
10.3.1 Training and Testing | p. 291 |
10.4 Types of Learning | p. 292 |
10.4.1 Types of Supervised Learning | p. 293 |
10.4.2 Supervised Learning: Notation and Formal Definitions | p. 293 |
10.5 Objectives of the Learning Algorithm | p. 294 |
10.6 Linear Regression | p. 295 |
10.7 Ridge Regression | p. 297 |
10.7.1 Predictors and Data Recording | p. 299 |
10.7.2 Underfitting and Overfitting | p. 300 |
10.8 Preamble for Kernel Methods | p. 300 |
10.9 Kernel Functions | p. 303 |
10.9.1 The "Kernel Trick" | p. 304 |
10.9.2 Design Issues | p. 305 |
10.9.3 Validation Data Sets | p. 306 |
10.9.3.1 Holdout Validation | p. 307 |
10.9.3.2 N-Fold Cross Validation | p. 307 |
10.10 Classification | p. 308 |
10.10.1 Classification as Machine Learning | p. 309 |
10.10.2 Ad Hoc Classification | p. 310 |
10.11 Heuristics for Classification | p. 311 |
10.11.1 Feature Weighting | p. 311 |
10.12 Nearest Neighbor Classification | p. 312 |
10.12.1 Delaunay and Voronoi | p. 313 |
10.12.2 Nearest Neighbor Time and Space Issues | p. 315 |
10.13 Support Vector Machines | p. 315 |
10.13.1 Linear Discrimination | p. 315 |
10.13.2 Margin of Separation | p. 318 |
10.13.3 Support Vectors | p. 319 |
10.13.4 The SVM as an Optimization Problem | p. 320 |
10.13.5 The Karush-Kuhn-Tucker Condition | p. 322 |
10.13.6 Evaluation of w[subscript 0] | p. 322 |
10.14 Linearly Nonseparable Data | p. 323 |
10.14.1 Parameter Values | p. 326 |
10.14.2 Evaluation of w[subscript 0] (Soft Margin Case) | p. 327 |
10.14.3 Classification with Soft Margin | p. 327 |
10.15 Support Vector Machines and Kernels | p. 328 |
10.16 Expected Test Error | p. 328 |
10.17 Transparency | p. 329 |
10.18 Exercises | p. 331 |
References | p. 334 |
Appendices | p. 337 |
Index | p. 385 |