Cover image for Structural bioinformatics : an algorithmic approach
Title:
Structural bioinformatics : an algorithmic approach
Personal Author:
Series:
Chapman & Hall/CRC mathematical and computational biology series
Publication Information:
Boca Raton, FL : Chapman & Hall, 2009
Physical Description:
xxiii, 406 p., [12] p. of plates : ill. (some col.) ; 25 cm.
ISBN:
9781584886839

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010210385 QH324.2 B87 2009 Open Access Book Book
Searching...

On Order

Summary

Summary

The Beauty of Protein Structures and the Mathematics behind Structural Bioinformatics
Providing the framework for a one-semester undergraduate course, Structural Bioinformatics: An Algorithmic Approach shows how to apply key algorithms to solve problems related to macromolecular structure.

Helps Students Go Further in Their Study of Structural Biology
Following some introductory material in the first few chapters, the text solves the longest common subsequence problem using dynamic programming and explains the science models for the Nussinov and MFOLD algorithms. It then reviews sequence alignment, along with the basic mathematical calculations needed for measuring the geometric properties of macromolecules. After looking at how coordinate transformations facilitate the translation and rotation of molecules in a 3D space, the author introduces structural comparison techniques, superposition algorithms, and algorithms that compare relationships within a protein. The final chapter explores how regression and classification are becoming more useful in protein analysis and drug design.

At the Crossroads of Biology, Mathematics, and Computer Science
Connecting biology, mathematics, and computer science, this practical text presents various bioinformatics topics and problems within a scientific methodology that emphasizes nature (the source of empirical observations), science (the mathematical modeling of the natural process), and computation (the science of calculating predictions and mathematical objects based on mathematical models).


Table of Contents

Prefacep. xix
Acknowledgmentsp. xxiii
Chapter 1 The Study of Structural Bioinformaticsp. 1
1.1 Motivationp. 1
1.2 Small Beginningsp. 4
1.3 Structural Bioinformatics and the Scientific Methodp. 5
1.3.1 Three Realms: Nature, Science, and Computationp. 6
1.3.2 Hypothesis, Model, and Theoryp. 8
1.3.3 Laws, Postulates, and Assumptionsp. 12
1.3.4 Model Theory and Computational Theoryp. 13
1.3.5 Different Assumptions for Different Modelsp. 14
1.4 A More Detailed Problem Analysis: Force Fieldsp. 15
1.4.1 Naturep. 16
1.4.2 Sciencep. 16
1.4.2.1 Energy Terms for Bonded Atomsp. 16
1.4.2.2 Energy Terms for Nonbonded Atomsp. 19
1.4.2.3 Total Potential Energyp. 21
1.4.3 Computationp. 21
1.5 Modeling Issuesp. 25
1.5.1 Rashomonp. 26
1.5.2 Ockhamp. 26
1.5.3 Bellmanp. 27
1.5.4 Interpretabilityp. 28
1.5.5 Refutabilityp. 29
1.5.6 Complexity and Approximationp. 29
1.6 Sources of Errorp. 32
1.7 Summaryp. 33
1.8 Exercisesp. 34
Referencesp. 36
Chapter 2 Introduction to Macromolecular Structurep. 37
2.1 Motivationp. 37
2.2 Overview of Protein Structurep. 38
2.2.1 Amino Acids and Primary Sequencep. 38
2.2.2 Secondary Structurep. 44
2.2.2.1 Alpha Helicesp. 44
2.2.2.2 Beta Strandsp. 47
2.2.2.3 Loopsp. 52
2.2.3 Tertiary Structurep. 53
2.2.3.1 What Is Tertiary Structure?p. 54
2.2.3.2 The Tertiary Structure of Myoglobinp. 54
2.2.3.3 Tertiary Structure Beyond the Binding Pocketp. 58
2.2.4 Quaternary Structurep. 64
2.2.5 Protein Functionalityp. 67
2.2.6 Protein Domainsp. 68
2.3 An Overview of Rna Structurep. 70
2.3.1 Nucleotides and RNA Primary Sequencep. 71
2.3.2 RNA Secondary Structurep. 72
2.3.3 RNA Tertiary Structurep. 75
2.4 Exercisesp. 78
Referencesp. 80
Chapter 3 Data Sources, Formats, and Applicationsp. 83
3.1 Motivationp. 83
3.2 Sources of Structural Datap. 84
3.2.1 PDB: The Protein Data Bankp. 84
3.2.2 PDBsum: The PDB Summaryp. 86
3.2.3 SCOP: Structural Classification of Proteinsp. 86
3.2.4 CATH: The CATH Hierarchyp. 88
3.2.5 PubChemp. 92
3.2.6 DrugBankp. 94
3.3 PDB File Formatp. 95
3.4 Visualization of Molecular Datap. 98
3.4.1 Plug-In versus Stand-Alonep. 99
3.4.2 Change of Viewing Perspectivep. 99
3.4.3 Graphical Representationp. 99
3.4.4 Visual Effectsp. 101
3.4.5 Selection Abilitiesp. 101
3.4.6 Computational Toolsp. 102
3.4.7 Extrasp. 102
3.5 Software for Structural Bioinformaticsp. 103
3.5.1 PyMOLp. 103
3.5.2 Eclipsep. 103
3.5.3 MarvinSketchp. 104
3.5.4 ACD/ChemSketchp. 104
3.5.5 JOELib2p. 105
3.5.6 Chemistry Development Kit (CDK)p. 105
3.5.7 BioPythonp. 105
3.6 Exercisesp. 106
Referencesp. 109
Chapter 4 Dynamic Programmingp. 111
4.1 Motivationp. 111
4.2 Introductionp. 112
4.3 A DP Example: The Al Gore Rhythm For Giving Talksp. 112
4.3.1 Problem Statementp. 112
4.3.2 Terminology: Configurations and Scoresp. 113
4.3.3 Analysis of Our Given Problemp. 113
4.4 A Recipe for Dynamic Programmingp. 116
4.5 Longest Common Subsequencep. 116
4.5.1 Problem Statementp. 117
4.5.2 Prefixesp. 118
4.5.3 Relations Among Subproblemsp. 118
4.5.4 A Recurrence for the LCSp. 119
4.6 Exercisesp. 123
Chapter 5 RNA Secondary Structure Predictionp. 125
5.1 Motivationp. 126
5.2 Introduction to the Problemp. 128
5.2.1 Naturep. 129
5.2.1.1 Where Do Hydrogen Bonds Form?p. 129
5.2.1.2 Thermodynamic Issuesp. 130
5.2.1.3 Consensus Sequence Patternsp. 132
5.2.1.4 Complicationsp. 133
5.2.2 Sciencep. 133
5.2.2.1 Modeling Secondary Structurep. 133
5.2.2.2 Single Base Pairsp. 134
5.2.2.3 Stacking Energy Modelsp. 134
5.2.3 Computationp. 138
5.2.3.1 Display of Secondary Structurep. 139
5.2.4 Restating the Problemp. 145
5.3 The Nussinov Dynamic Programming Algorithmp. 146
5.3.1 Execution Timep. 155
5.4 The Mfold Algorithm: Terminologyp. 155
5.4.1 The MFOLD Algorithm: Recursionp. 160
5.4.2 MFOLD Extensionsp. 162
5.4.3 MFOLD Execution Timep. 162
5.5 Exercisesp. 163
Referencesp. 164
Chapter 6 Protein Sequence Alignmentp. 167
6.1 Protein Homologyp. 167
6.1.1 Naturep. 168
6.1.2 Sciencep. 170
6.1.2.1 Partial Matchesp. 172
6.1.2.2 Building a BLOSUM Matrixp. 173
6.1.2.3 Gapsp. 179
6.1.2.4 Summaryp. 180
6.1.3 Computationp. 180
6.1.3.1 Subproblem Specificationp. 181
6.1.3.2 Scoring Alignmentsp. 181
6.1.3.3 Suitability of the Subproblemp. 182
6.1.3.4 A Global Alignment Examplep. 186
6.2 Variations in the Global Alignment Algorithmp. 186
6.3 The Significance of a Global Alignmentp. 187
6.3.1 Computer-Assisted Comparisonp. 188
6.3.2 Percentage Identity Comparisonp. 189
6.4 Local Alignmentp. 190
6.5 Exercisesp. 193
Referencesp. 195
Chapter 7 Protein Geometryp. 197
7.1 Motivationp. 197
7.2 Introductionp. 198
7.3 Calculations Related to Protein Geometryp. 198
7.3.1 Interatomic Distancep. 198
7.3.2 Bond Anglep. 198
7.3.3 Dihedral Anglesp. 199
7.3.3.1 Defining Dihedral Anglesp. 199
7.3.3.2 Computation of a Normalp. 201
7.3.3.3 Calculating the Phi Dihedral Anglep. 204
7.3.3.4 Sign of the Dihedral Anglep. 204
7.3.3.5 Calculating the Psi Dihedral Anglep. 206
7.4 Ramachandran Plotsp. 206
7.5 Inertial Axesp. 212
7.6 Exercisesp. 216
Referencesp. 220
Chapter 8 Coordinate Transformationsp. 223
8.1 Motivationp. 223
8.2 Introductionp. 224
8.3 Translation Transformationsp. 224
8.3.1 Translation to Find Centroid at the Originp. 224
8.4 Rotation Transformationsp. 225
8.4.1 Rotation Transformations in the Planep. 226
8.4.2 Rotations in 3-D Spacep. 227
8.5 Isometric Transformationsp. 231
8.5.1 Our Setting Is a Euclidean Vector Spacep. 232
8.5.2 Orthogonality of A Implies Isometry of Tp. 232
8.5.3 Isometry of T Implies Orthogonality of Ap. 233
8.5.4 Preservation of Anglesp. 234
8.5.5 More Isometriesp. 234
8.5.6 Back to Rotations in the Planep. 235
8.5.7 Rotations in the 3-D Space: A Summaryp. 238
8.6 Exercisesp. 238
Referencesp. 239
Chapter 9 Structure Comparison, Alignment, and Superpositionp. 241
9.1 Motivationp. 242
9.2 Introductionp. 245
9.2.1 Specifying the Problemp. 245
9.3 Techniques for Structural Comparisonp. 246
9.4 Scoring Similarities and Optimizing Scoresp. 247
9.5 Superposition Algorithmsp. 247
9.5.1 Overviewp. 247
9.5.2 Characterizing the Superposition Algorithmp. 249
9.5.3 Formal Problem Descriptionp. 249
9.5.4 Computations to Achieve Maximal Overlapp. 251
9.5.5 Summaryp. 257
9.5.6 Measuring Overlapp. 259
9.5.6.1 Calculation of the Root Mean Square Deviation (RMSD)p. 259
9.5.6.2 RMSD Issuesp. 259
9.5.7 Dealing with Weaker Sequence Similarityp. 260
9.5.8 Strategies Based on a Distance Matrixp. 261
9.6 Algorithms Comparing Relationships within Proteinsp. 263
9.6.1 Dalip. 263
9.6.2 SSAPp. 267
9.6.2.1 Motivationp. 267
9.6.2.2 Introduction to SSAPp. 269
9.6.2.3 Overview of SSAPp. 271
9.6.2.4 Calculating the Viewsp. 272
9.6.2.5 Building the Consensus Matrixp. 272
9.6.2.6 Compute the Optimal Path in the Consensus Matrixp. 278
9.7 Exercisesp. 279
Referencesp. 282
Chapter 10 Machine Learningp. 285
10.1 Motivationp. 285
10.2 Issues of Complexityp. 287
10.2.1 Computational Scalabilityp. 287
10.2.2 Intrinsic Complexityp. 287
10.2.3 Inadequate Knowledgep. 288
10.3 Prediction Via Machine Learningp. 289
10.3.1 Training and Testingp. 291
10.4 Types of Learningp. 292
10.4.1 Types of Supervised Learningp. 293
10.4.2 Supervised Learning: Notation and Formal Definitionsp. 293
10.5 Objectives of the Learning Algorithmp. 294
10.6 Linear Regressionp. 295
10.7 Ridge Regressionp. 297
10.7.1 Predictors and Data Recordingp. 299
10.7.2 Underfitting and Overfittingp. 300
10.8 Preamble for Kernel Methodsp. 300
10.9 Kernel Functionsp. 303
10.9.1 The "Kernel Trick"p. 304
10.9.2 Design Issuesp. 305
10.9.3 Validation Data Setsp. 306
10.9.3.1 Holdout Validationp. 307
10.9.3.2 N-Fold Cross Validationp. 307
10.10 Classificationp. 308
10.10.1 Classification as Machine Learningp. 309
10.10.2 Ad Hoc Classificationp. 310
10.11 Heuristics for Classificationp. 311
10.11.1 Feature Weightingp. 311
10.12 Nearest Neighbor Classificationp. 312
10.12.1 Delaunay and Voronoip. 313
10.12.2 Nearest Neighbor Time and Space Issuesp. 315
10.13 Support Vector Machinesp. 315
10.13.1 Linear Discriminationp. 315
10.13.2 Margin of Separationp. 318
10.13.3 Support Vectorsp. 319
10.13.4 The SVM as an Optimization Problemp. 320
10.13.5 The Karush-Kuhn-Tucker Conditionp. 322
10.13.6 Evaluation of w[subscript 0]p. 322
10.14 Linearly Nonseparable Datap. 323
10.14.1 Parameter Valuesp. 326
10.14.2 Evaluation of w[subscript 0] (Soft Margin Case)p. 327
10.14.3 Classification with Soft Marginp. 327
10.15 Support Vector Machines and Kernelsp. 328
10.16 Expected Test Errorp. 328
10.17 Transparencyp. 329
10.18 Exercisesp. 331
Referencesp. 334
Appendicesp. 337
Indexp. 385