Structural bioinformatics : an algorithmic approach

Title:

Personal Author:

Burkowski, Forbes J.

Series:

Chapman & Hall/CRC mathematical and computational biology series

Publication Information:

Boca Raton, FL : Chapman & Hall, 2009

Physical Description:

xxiii, 406 p., [12] p. of plates : ill. (some col.) ; 25 cm.

ISBN:

9781584886839

Subject Term:

Structural bioinformatics

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010210385	QH324.2 B87 2009	Open Access Book	Book	Searching... Unknown

The Beauty of Protein Structures and the Mathematics behind Structural Bioinformatics
Providing the framework for a one-semester undergraduate course, Structural Bioinformatics: An Algorithmic Approach shows how to apply key algorithms to solve problems related to macromolecular structure.

Helps Students Go Further in Their Study of Structural Biology
Following some introductory material in the first few chapters, the text solves the longest common subsequence problem using dynamic programming and explains the science models for the Nussinov and MFOLD algorithms. It then reviews sequence alignment, along with the basic mathematical calculations needed for measuring the geometric properties of macromolecules. After looking at how coordinate transformations facilitate the translation and rotation of molecules in a 3D space, the author introduces structural comparison techniques, superposition algorithms, and algorithms that compare relationships within a protein. The final chapter explores how regression and classification are becoming more useful in protein analysis and drug design.

At the Crossroads of Biology, Mathematics, and Computer Science
Connecting biology, mathematics, and computer science, this practical text presents various bioinformatics topics and problems within a scientific methodology that emphasizes nature (the source of empirical observations), science (the mathematical modeling of the natural process), and computation (the science of calculating predictions and mathematical objects based on mathematical models).

Preface	p. xix
Acknowledgments	p. xxiii
Chapter 1 The Study of Structural Bioinformatics	p. 1
1.1 Motivation	p. 1
1.2 Small Beginnings	p. 4
1.3 Structural Bioinformatics and the Scientific Method	p. 5
1.3.1 Three Realms: Nature, Science, and Computation	p. 6
1.3.2 Hypothesis, Model, and Theory	p. 8
1.3.3 Laws, Postulates, and Assumptions	p. 12
1.3.4 Model Theory and Computational Theory	p. 13
1.3.5 Different Assumptions for Different Models	p. 14
1.4 A More Detailed Problem Analysis: Force Fields	p. 15
1.4.1 Nature	p. 16
1.4.2 Science	p. 16
1.4.2.1 Energy Terms for Bonded Atoms	p. 16
1.4.2.2 Energy Terms for Nonbonded Atoms	p. 19
1.4.2.3 Total Potential Energy	p. 21
1.4.3 Computation	p. 21
1.5 Modeling Issues	p. 25
1.5.1 Rashomon	p. 26
1.5.2 Ockham	p. 26
1.5.3 Bellman	p. 27
1.5.4 Interpretability	p. 28
1.5.5 Refutability	p. 29
1.5.6 Complexity and Approximation	p. 29
1.6 Sources of Error	p. 32
1.7 Summary	p. 33
1.8 Exercises	p. 34
References	p. 36
Chapter 2 Introduction to Macromolecular Structure	p. 37
2.1 Motivation	p. 37
2.2 Overview of Protein Structure	p. 38
2.2.1 Amino Acids and Primary Sequence	p. 38
2.2.2 Secondary Structure	p. 44
2.2.2.1 Alpha Helices	p. 44
2.2.2.2 Beta Strands	p. 47
2.2.2.3 Loops	p. 52
2.2.3 Tertiary Structure	p. 53
2.2.3.1 What Is Tertiary Structure?	p. 54
2.2.3.2 The Tertiary Structure of Myoglobin	p. 54
2.2.3.3 Tertiary Structure Beyond the Binding Pocket	p. 58
2.2.4 Quaternary Structure	p. 64
2.2.5 Protein Functionality	p. 67
2.2.6 Protein Domains	p. 68
2.3 An Overview of Rna Structure	p. 70
2.3.1 Nucleotides and RNA Primary Sequence	p. 71
2.3.2 RNA Secondary Structure	p. 72
2.3.3 RNA Tertiary Structure	p. 75
2.4 Exercises	p. 78
References	p. 80
Chapter 3 Data Sources, Formats, and Applications	p. 83
3.1 Motivation	p. 83
3.2 Sources of Structural Data	p. 84
3.2.1 PDB: The Protein Data Bank	p. 84
3.2.2 PDBsum: The PDB Summary	p. 86
3.2.3 SCOP: Structural Classification of Proteins	p. 86
3.2.4 CATH: The CATH Hierarchy	p. 88
3.2.5 PubChem	p. 92
3.2.6 DrugBank	p. 94
3.3 PDB File Format	p. 95
3.4 Visualization of Molecular Data	p. 98
3.4.1 Plug-In versus Stand-Alone	p. 99
3.4.2 Change of Viewing Perspective	p. 99
3.4.3 Graphical Representation	p. 99
3.4.4 Visual Effects	p. 101
3.4.5 Selection Abilities	p. 101
3.4.6 Computational Tools	p. 102
3.4.7 Extras	p. 102
3.5 Software for Structural Bioinformatics	p. 103
3.5.1 PyMOL	p. 103
3.5.2 Eclipse	p. 103
3.5.3 MarvinSketch	p. 104
3.5.4 ACD/ChemSketch	p. 104
3.5.5 JOELib2	p. 105
3.5.6 Chemistry Development Kit (CDK)	p. 105
3.5.7 BioPython	p. 105
3.6 Exercises	p. 106
References	p. 109
Chapter 4 Dynamic Programming	p. 111
4.1 Motivation	p. 111
4.2 Introduction	p. 112
4.3 A DP Example: The Al Gore Rhythm For Giving Talks	p. 112
4.3.1 Problem Statement	p. 112
4.3.2 Terminology: Configurations and Scores	p. 113
4.3.3 Analysis of Our Given Problem	p. 113
4.4 A Recipe for Dynamic Programming	p. 116
4.5 Longest Common Subsequence	p. 116
4.5.1 Problem Statement	p. 117
4.5.2 Prefixes	p. 118
4.5.3 Relations Among Subproblems	p. 118
4.5.4 A Recurrence for the LCS	p. 119
4.6 Exercises	p. 123
Chapter 5 RNA Secondary Structure Prediction	p. 125
5.1 Motivation	p. 126
5.2 Introduction to the Problem	p. 128
5.2.1 Nature	p. 129
5.2.1.1 Where Do Hydrogen Bonds Form?	p. 129
5.2.1.2 Thermodynamic Issues	p. 130
5.2.1.3 Consensus Sequence Patterns	p. 132
5.2.1.4 Complications	p. 133
5.2.2 Science	p. 133
5.2.2.1 Modeling Secondary Structure	p. 133
5.2.2.2 Single Base Pairs	p. 134
5.2.2.3 Stacking Energy Models	p. 134
5.2.3 Computation	p. 138
5.2.3.1 Display of Secondary Structure	p. 139
5.2.4 Restating the Problem	p. 145
5.3 The Nussinov Dynamic Programming Algorithm	p. 146
5.3.1 Execution Time	p. 155
5.4 The Mfold Algorithm: Terminology	p. 155
5.4.1 The MFOLD Algorithm: Recursion	p. 160
5.4.2 MFOLD Extensions	p. 162
5.4.3 MFOLD Execution Time	p. 162
5.5 Exercises	p. 163
References	p. 164
Chapter 6 Protein Sequence Alignment	p. 167
6.1 Protein Homology	p. 167
6.1.1 Nature	p. 168
6.1.2 Science	p. 170
6.1.2.1 Partial Matches	p. 172
6.1.2.2 Building a BLOSUM Matrix	p. 173
6.1.2.3 Gaps	p. 179
6.1.2.4 Summary	p. 180
6.1.3 Computation	p. 180
6.1.3.1 Subproblem Specification	p. 181
6.1.3.2 Scoring Alignments	p. 181
6.1.3.3 Suitability of the Subproblem	p. 182
6.1.3.4 A Global Alignment Example	p. 186
6.2 Variations in the Global Alignment Algorithm	p. 186
6.3 The Significance of a Global Alignment	p. 187
6.3.1 Computer-Assisted Comparison	p. 188
6.3.2 Percentage Identity Comparison	p. 189
6.4 Local Alignment	p. 190
6.5 Exercises	p. 193
References	p. 195
Chapter 7 Protein Geometry	p. 197
7.1 Motivation	p. 197
7.2 Introduction	p. 198
7.3 Calculations Related to Protein Geometry	p. 198
7.3.1 Interatomic Distance	p. 198
7.3.2 Bond Angle	p. 198
7.3.3 Dihedral Angles	p. 199
7.3.3.1 Defining Dihedral Angles	p. 199
7.3.3.2 Computation of a Normal	p. 201
7.3.3.3 Calculating the Phi Dihedral Angle	p. 204
7.3.3.4 Sign of the Dihedral Angle	p. 204
7.3.3.5 Calculating the Psi Dihedral Angle	p. 206
7.4 Ramachandran Plots	p. 206
7.5 Inertial Axes	p. 212
7.6 Exercises	p. 216
References	p. 220
Chapter 8 Coordinate Transformations	p. 223
8.1 Motivation	p. 223
8.2 Introduction	p. 224
8.3 Translation Transformations	p. 224
8.3.1 Translation to Find Centroid at the Origin	p. 224
8.4 Rotation Transformations	p. 225
8.4.1 Rotation Transformations in the Plane	p. 226
8.4.2 Rotations in 3-D Space	p. 227
8.5 Isometric Transformations	p. 231
8.5.1 Our Setting Is a Euclidean Vector Space	p. 232
8.5.2 Orthogonality of A Implies Isometry of T	p. 232
8.5.3 Isometry of T Implies Orthogonality of A	p. 233
8.5.4 Preservation of Angles	p. 234
8.5.5 More Isometries	p. 234
8.5.6 Back to Rotations in the Plane	p. 235
8.5.7 Rotations in the 3-D Space: A Summary	p. 238
8.6 Exercises	p. 238
References	p. 239
Chapter 9 Structure Comparison, Alignment, and Superposition	p. 241
9.1 Motivation	p. 242
9.2 Introduction	p. 245
9.2.1 Specifying the Problem	p. 245
9.3 Techniques for Structural Comparison	p. 246
9.4 Scoring Similarities and Optimizing Scores	p. 247
9.5 Superposition Algorithms	p. 247
9.5.1 Overview	p. 247
9.5.2 Characterizing the Superposition Algorithm	p. 249
9.5.3 Formal Problem Description	p. 249
9.5.4 Computations to Achieve Maximal Overlap	p. 251
9.5.5 Summary	p. 257
9.5.6 Measuring Overlap	p. 259
9.5.6.1 Calculation of the Root Mean Square Deviation (RMSD)	p. 259
9.5.6.2 RMSD Issues	p. 259
9.5.7 Dealing with Weaker Sequence Similarity	p. 260
9.5.8 Strategies Based on a Distance Matrix	p. 261
9.6 Algorithms Comparing Relationships within Proteins	p. 263
9.6.1 Dali	p. 263
9.6.2 SSAP	p. 267
9.6.2.1 Motivation	p. 267
9.6.2.2 Introduction to SSAP	p. 269
9.6.2.3 Overview of SSAP	p. 271
9.6.2.4 Calculating the Views	p. 272
9.6.2.5 Building the Consensus Matrix	p. 272
9.6.2.6 Compute the Optimal Path in the Consensus Matrix	p. 278
9.7 Exercises	p. 279
References	p. 282
Chapter 10 Machine Learning	p. 285
10.1 Motivation	p. 285
10.2 Issues of Complexity	p. 287
10.2.1 Computational Scalability	p. 287
10.2.2 Intrinsic Complexity	p. 287
10.2.3 Inadequate Knowledge	p. 288
10.3 Prediction Via Machine Learning	p. 289
10.3.1 Training and Testing	p. 291
10.4 Types of Learning	p. 292
10.4.1 Types of Supervised Learning	p. 293
10.4.2 Supervised Learning: Notation and Formal Definitions	p. 293
10.5 Objectives of the Learning Algorithm	p. 294
10.6 Linear Regression	p. 295
10.7 Ridge Regression	p. 297
10.7.1 Predictors and Data Recording	p. 299
10.7.2 Underfitting and Overfitting	p. 300
10.8 Preamble for Kernel Methods	p. 300
10.9 Kernel Functions	p. 303
10.9.1 The "Kernel Trick"	p. 304
10.9.2 Design Issues	p. 305
10.9.3 Validation Data Sets	p. 306
10.9.3.1 Holdout Validation	p. 307
10.9.3.2 N-Fold Cross Validation	p. 307
10.10 Classification	p. 308
10.10.1 Classification as Machine Learning	p. 309
10.10.2 Ad Hoc Classification	p. 310
10.11 Heuristics for Classification	p. 311
10.11.1 Feature Weighting	p. 311
10.12 Nearest Neighbor Classification	p. 312
10.12.1 Delaunay and Voronoi	p. 313
10.12.2 Nearest Neighbor Time and Space Issues	p. 315
10.13 Support Vector Machines	p. 315
10.13.1 Linear Discrimination	p. 315
10.13.2 Margin of Separation	p. 318
10.13.3 Support Vectors	p. 319
10.13.4 The SVM as an Optimization Problem	p. 320
10.13.5 The Karush-Kuhn-Tucker Condition	p. 322
10.13.6 Evaluation of w[subscript 0]	p. 322
10.14 Linearly Nonseparable Data	p. 323
10.14.1 Parameter Values	p. 326
10.14.2 Evaluation of w[subscript 0] (Soft Margin Case)	p. 327
10.14.3 Classification with Soft Margin	p. 327
10.15 Support Vector Machines and Kernels	p. 328
10.16 Expected Test Error	p. 328
10.17 Transparency	p. 329
10.18 Exercises	p. 331
References	p. 334
Appendices	p. 337
Index	p. 385

Available:*

On Order

Summary

Summary

Table of Contents