Skip to:Content
|
Bottom
Cover image for Bioinformatics : problem solving paradigms
Title:
Bioinformatics : problem solving paradigms
Personal Author:
Publication Information:
Berlin : Springer, 2008
Physical Description:
xi, 289 p. : ill. ; 24 cm.
ISBN:
9783540785057

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010194963 QH324.2 S63 2008 Open Access Book Book
Searching...

On Order

Summary

Summary

What is this book good for? Imagine you are a computer scientist working in the bioinformatics area. P- bably you will be a member of a highly interdisciplinary team consisting of biologists, chemists, mathematicians, computer scientists ranging from p- grammers to algorithm engineers, and eventually people from various further ?elds. A major problem within such interdisciplinary teams is always to ?nd some common language, and, for each member of some discipline, to have profound knowledge of what are the notions, basic concepts and goals of the other participating disciplines, as well as of what they can contribute to the solutionofonesownproblems. Thisdoes,ofcourse,notmeanthatacomputer scientist should do the job of the biologist. Nevertheless, a computer scientist should be able to understand what a biologist deals with. On the other hand, the biologist should not do the computer scientists job, but should know what computer science and algorithm engineering might contribute to thesolution of her/his problems, and also how problems should be stated in order for the computer scientist to understand them. This book primarily aims to show the potential that algorithm engin- ring o?ers for the solution of core bioinformatics problems.


Table of Contents

1 Core Bioinformatics Problemsp. 1
1.1 DNA Mapping and Sequencingp. 1
1.1.1 Size of Human DNAp. 1
1.1.2 Copying DNA: Polymerase Chain Reaction (PCR)p. 2
1.1.3 Hybridization and Microarraysp. 2
1.1.4 Cutting DNA into Fragmentsp. 2
1.1.5 Prefix Cuttingp. 3
1.1.6 Unique Markersp. 3
1.1.7 Measuring the Length of DNA Moleculesp. 4
1.1.8 Sequencing Short DNA Moleculesp. 4
1.1.9 Mapping Long DNA Moleculesp. 4
1.1.10 Mapping by Single Digestionp. 5
1.1.11 Mapping by Double Digestionp. 6
1.1.12 Mid-sized DNA Molecules: Shotgun Sequencingp. 7
1.2 String Storage and Pattern Matchingp. 8
1.2.1 Complexity of Pattern Matching in Bioinformaticsp. 8
1.2.2 First Look at Suffix Treesp. 9
1.2.3 A Couple of Suffix Tree Applicationsp. 10
1.3 String Alignmentp. 11
1.4 Multiple Alignmentp. 13
1.5 Gene Detectionp. 15
1.6 Gene Comparisonp. 16
1.7 RNA Structure Predictionp. 18
1.8 Protein Structure Predictionp. 19
1.8.1 Holy Grail Protein Structure Predictionp. 19
1.8.2 Secondary Structurep. 20
1.8.3 Tertiary Structure Prediction with Contact Mapsp. 20
1.8.4 HP-Model Approachp. 21
1.8.5 Protein Threadingp. 22
1.9 Molecular Networksp. 23
1.10 Bibliographic Remarksp. 24
2 Turning to Algorithmic Problemsp. 25
2.1 Presentation of Algorithmic Problemsp. 25
2.2 DNA Mappingp. 26
2.2.1 Mapping by Hybridizationp. 26
2.2.2 Mapping by Single Digestionp. 27
2.2.3 Mapping by Double Digestionp. 30
2.3 Shotgun Sequencing and Shortest Common Superstringsp. 31
2.4 Exact Pattern Matchingp. 35
2.4.1 Naive Pattern Matchingp. 35
2.4.2 Knuth-Morris-Pratt Algorithmp. 35
2.4.3 Multi-Pattern Searchp. 37
2.4.4 Multi-Pattern Searchp. 37
2.4.5 Formal Definition of Suffix Treesp. 38
2.5 Soft Pattern Matching = Alignmentp. 41
2.5.1 Alignments Restatedp. 41
2.5.2 Evaluating Alignments by Scoring Functionsp. 41
2.6 Multiple Alignmentp. 43
2.6.1 Sum-of-Pairs Maximizationp. 43
2.6.2 Multiple Alignment Along a Guide Treep. 43
2.6.3 Consensus Line Optimizationp. 45
2.6.4 Steiner Stringp. 45
2.6.5 Equivalence of Consensus Lines and Steiner Stringsp. 45
2.6.6 Generalization to Steiner Trees/Phylogenetic Alignmentp. 47
2.6.7 Profile Alignmentp. 47
2.6.8 Hidden Markov Multiple Alignmentp. 48
2.6.9 Example Heuristic for Obtaining Multiple Alignmentsp. 51
2.7 Gene Detectionp. 52
2.8 Genome Rearrangementp. 53
2.9 RNA Structure Predictionp. 55
2.10 Protein Structure Predictionp. 56
2.10.1 Comparative Approaches Using Neural Networksp. 56
2.10.2 Protein Threadingp. 57
2.11 Bi-Clusteringp. 59
2.12 Bibliographic Remarksp. 60
3 Dynamic Programmingp. 61
3.1 General Principle of Dynamic Programmingp. 61
3.2 Alignmentp. 63
3.2.1 Problem Restatedp. 63
3.2.2 Parameterizationp. 64
3.2.3 Bellman Principlep. 64
3.2.4 Recursive Solutionp. 64
3.2.5 Number of Different Subcalls and Overall Complexityp. 65
3.2.6 Tabular Organization of the Bottom-Up Computationp. 65
3.3 Multiple Alignmentp. 67
3.4 Affine Gap Alignmentp. 67
3.4.1 Problem Restatedp. 67
3.4.2 Parameterization and Conditioningp. 68
3.4.3 Bellman Principlep. 68
3.4.4 Recursive Solutionp. 68
3.4.5 Number of Different Subcalls and Overall Complexityp. 69
3.5 Exon Assemblyp. 69
3.5.1 Problem Restatedp. 69
3.5.2 Parameterization and Conditioningp. 70
3.5.3 Bellman Principlep. 70
3.5.4 Recursive Solutionp. 71
3.5.5 Number of Different Subcalls and Overall Complexityp. 72
3.5.6 Preprocessing of Simple Termsp. 72
3.5.7 Parallel Computation of Recursive Termsp. 72
3.6 RNA Structure Predictionp. 73
3.6.1 Problem Restatedp. 73
3.6.2 Parameterization and Conditioningp. 74
3.6.3 Recursive Solution and Bellman Principlep. 74
3.6.4 Number of Different Subcalls and Overall Complexityp. 75
3.6.5 Variation: Free Energy Minimizationp. 76
3.7 Viterbi Algorithmp. 83
3.7.1 Problem Restatedp. 83
3.7.2 Parameterization and Conditioningp. 83
3.7.3 Bellman Principlep. 83
3.7.4 Recursive Computationp. 83
3.7.5 Counting and Overall Complexityp. 84
3.8 Baum-Welch Algorithmp. 84
3.8.1 Problem Motivationp. 84
3.8.2 A Couple of Important 'Variables'p. 84
3.8.3 Computing Forward Variablesp. 85
3.8.4 Computing Backward Variablesp. 85
3.8.5 Computing Transition Variablesp. 85
3.8.6 Computing State Variablesp. 85
3.8.7 Model Improvementp. 86
3.9 Expressiveness of Viterbi Equationsp. 86
3.10 Lifted Phylogenetic Alignmentp. 91
3.10.1 Problem Restatedp. 91
3.10.2 Parameterization and Conditioningp. 92
3.10.3 Bellman Principlep. 92
3.10.4 Recursive Solutionp. 92
3.10.5 Counting and Overall Complexityp. 93
3.10.6 Preprocessing of Simple Termsp. 93
3.11 Protein Threadingp. 93
3.11.1 Problem Restatedp. 93
3.11.2 Parameterization and Conditioningp. 94
3.11.3 Bellman Principlep. 94
3.11.4 Recursive Solutionp. 94
3.11.5 Counting and Overall Complexityp. 95
3.12 Bibliographic Remarksp. 95
4 Intelligent Data Structuresp. 97
4.1 Representation Mattersp. 97
4.1.1 Intelligent Data Structures Used in Computer Sciencep. 97
4.1.2 Covering a Truncated Boardp. 97
4.1.3 Choose Three Bits and Winp. 98
4.1.4 Check a Proofp. 99
4.2 PQ-Treesp. 101
4.2.1 Basic Notionsp. 101
4.2.2 Transformation rules for marked PQ-Treesp. 103
4.3 Suffix Treesp. 111
4.3.1 Outline of Ukkonen's Algorithm and Basic Proceduresp. 111
4.3.2 Insertion Procedurep. 112
4.3.3 Saving Single Character Insertionsp. 115
4.3.4 Saving Navigation Stepsp. 116
4.3.5 Estimation of the Overall Number of Visited Nodesp. 117
4.3.6 Ukkonen's Theoremp. 120
4.3.7 Common Suffix Tree for Several Stringsp. 123
4.3.8 Applications of Suffix Treesp. 124
4.4 Least Common Ancestorp. 131
4.4.1 Motivationp. 131
4.4.2 Least Common Ancestor in Full Binary Treesp. 131
4.4.3 Least Common Ancestor in Arbitrary Treesp. 135
4.5 Signed Genome Rearrangementp. 144
4.5.1 Reality-Desire Diagrams of Signed Permutationsp. 144
4.5.2 "'Ping-Pong"' of Lower and Upper Boundsp. 148
4.5.3 Padding RD-Diagramsp. 159
4.5.4 Sorting Padded Good Componentsp. 161
4.5.5 Summary of the Sorting Procedurep. 164
4.6 Bibliographic Remarksp. 166
5 NP-Hardness of Core Bioinformatics Problemsp. 167
5.1 Getting Familiar with Basic Notions of Complexity Theoryp. 167
5.2 Cook's Theorem: Primer NP-Complete Problemp. 171
5.3 "Zoo" of NP-Complete Problemsp. 175
5.4 Bridge to Bioinformatics: Shortest Common Super-Sequencep. 188
5.5 NP-Completeness of Core Bioinformatics Problemsp. 198
5.5.1 Multiple Alignmentp. 198
5.5.2 Shortest Common Superstringp. 201
5.5.3 Double Digestp. 204
5.5.4 Protein Threadingp. 204
5.5.5 Bi-Clusteringp. 206
5.5.6 Pseudoknot Predictionp. 209
5.5.7 The Picture of Presented Reductionsp. 213
5.5.8 Further NP-Complete Bioinformatics Problemsp. 213
5.6 Bibliographic Remarksp. 215
6 Approximation Algorithmsp. 217
6.1 Basics of Approximation Algorithmsp. 217
6.1.1 What is an Approximation Algorithm?p. 217
6.1.2 First Example: Metric Travelling Salesmanp. 217
6.1.3 Lower and Upper Boundingp. 218
6.1.4 What are Approximation Algorithms good for?p. 219
6.2 Shortest Common Superstring Problemp. 221
6.2.1 Feasible Lower Boundp. 221
6.2.2 Cycle Covers Related to Perfect Matchings Problemp. 222
6.2.3 Computing Maximum Cost Perfect Matchingsp. 223
6.2.4 Cycle Cover Versus Cyclic Covering by Stringsp. 225
6.2.5 Some Combinatorial Statements on Cyclic Coveringp. 226
6.2.6 4-Approximation Algorithmp. 228
6.2.7 Putting Things Togetherp. 229
6.3 Steiner Stringp. 230
6.3.1 Problem Restatedp. 230
6.3.2 Feasible Lower Boundp. 230
6.3.3 2-Approximation Algorithmp. 231
6.4 Phylogenetic Alignmentp. 232
6.5 Multiple Alignmentp. 234
6.5.1 Feasible Lower Boundp. 234
6.5.2 2-Approximation Algorithmp. 235
6.6 Sorting Unsigned Genomesp. 236
6.6.1 Feasible Lower Boundp. 236
6.6.2 4-Approximation Algorithmp. 237
6.6.3 2-Approximation Algorithmp. 239
6.7 HP-Model Protein Structure Predictionp. 243
6.7.1 Feasible Upper Boundp. 243
6.7.2 Asymptotic 4-Approximation Algorithmp. 245
6.8 Bibliographic Remarksp. 248
7 A Selection of Metaheuristics and Various Projectsp. 249
7.1 Multi-Layer Perceptronp. 249
7.1.1 Architecture of Multi-Layer Perceptronsp. 250
7.1.2 Forward Propagation of Input Signalsp. 251
7.1.3 Input and Output Encodingp. 251
7.1.4 Backpropagation of Error Signalsp. 252
7.1.5 Tuning Backpropagationp. 253
7.1.6 Generalization and Overfittingp. 253
7.1.7 Application to Secondary Structure Predictionp. 254
7.1.8 Further Applicationsp. 254
7.2 Support Vector Machinesp. 255
7.2.1 Architecture of Support Vector Machinesp. 255
7.2.2 Margin Maximizationp. 256
7.2.3 Primal Problemp. 257
7.2.4 Dual Problemp. 258
7.2.5 Kernel Function Trickp. 259
7.2.6 Generalization and Over-Fittingp. 260
7.2.7 Application to Contact Map Predictionp. 261
7.2.8 Soft Margin Maximizationp. 263
7.2.9 Semi-Supervised Support Vector Machinesp. 264
7.3 Hidden Markov Modelsp. 265
7.3.1 Architecture of Hidden Markov Modelsp. 266
7.3.2 Causes and Effectsp. 266
7.3.3 Bioinformatics Application: CG Islandsp. 268
7.3.4 Further Applicationsp. 269
7.4 Genetic Algorithmsp. 269
7.4.1 Basic Characteristics of Genetic Algorithmsp. 269
7.4.2 Application to HP-Fold Optimizationp. 271
7.4.3 Further Applicationsp. 272
7.5 Ant Colony Optimizationp. 272
7.5.1 Basic Characteristics of Ant Colony Algorithmsp. 273
7.5.2 Application to HP-Fold Optimizationp. 275
7.5.3 Further Applicationsp. 276
7.6 Bibliographic Remarksp. 276
Referencesp. 279
Indexp. 285
Go to:Top of Page