Cover image for Bioinformatics : the machine learning approach
Title:
Bioinformatics : the machine learning approach
Personal Author:
Series:
Adaptive computation and machine learning
Edition:
2nd ed
Publication Information:
Cambridge, MA : MIT Press, 2001
ISBN:
9780262025065
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010127659 QH506 B34 2001 Open Access Book Book
Searching...

On Order

Summary

Summary

A guide to machine learning approaches and their application to the analysis of biological data.

An unprecedented wealth of data is being generated by genome sequencing projects and other experimental efforts to determine the structure and function of biological molecules. The demands and opportunities for interpreting these data are expanding rapidly. Bioinformatics is the development and application of computer methods for management, analysis, interpretation, and prediction, as well as for the design of experiments. Machine learning approaches (e.g., neural networks, hidden Markov models, and belief networks) are ideally suited for areas where there is a lot of data but little theory, which is the situation in molecular biology. The goal in machine learning is to extract useful information from a body of data by building good probabilistic models-and to automate the process as much as possible.

In this book Pierre Baldi and S ren Brunak present the key machine learning approaches and apply them to the computational problems encountered in the analysis of biological data. The book is aimed both at biologists and biochemists who need to understand new data-driven algorithms and at those with a primary background in physics, mathematics, statistics, or computer science who need to know more about applications in molecular biology.

This new second edition contains expanded coverage of probabilistic graphical models and of the applications of neural networks, as well as a new chapter on microarrays and gene expression. The entire text has been extensively revised.


Author Notes

Pierre Baldi is Professor of Information and Computer Science and of Biological Chemistry (College of Medicine) and Director of the Institute for Genomics and Bioinformatics at the University of California, Irvine.

S ren Brunak is Professor and Director of the Center for Biological Sequence Analysis at the Technical University of Denmark.


Table of Contents

Series Foreword
Preface
1 Introduction
1.1 Biological Data in Digital Symbol Sequences
1.2 Genomes--Diversity, Size, and Structure
1.3 Proteins and Proteomes
1.4 On the Information Content of Biological Sequences
1.5 Prediction of Molecular Function and Structure
2 Machine Learning Foundations: The Probabilistic Framework
2.1 Introduction: Bayesian Modeling
2.2 The Cox-Jaynes Axioms
2.3 Bayesian Inference and Induction
2.4 Model Structures: Graphical Models and Other Tricks
2.5 Summary
3 Probabilistic Modeling and Inference: Examples
3.1 The Simplest Sequence Models
3.2 Statistical Mechanics
4 Machine Learning Algorithms
4.1 Introduction
4.2 Dynamic Programming
4.3 Gradient Descent
4.4 EM/GEM Algorithms
4.5 Markov Chain Monte Carlo Methods
4.6 Simulated Annealing
4.7 Evolutionary and Genetic Algorithms
4.8 Learning Algorithms: Miscellaneous Aspects
5 Neural Networks: The Theory
5.1 Introduction
5.2 Universal Approximation Properties
5.3 Priors and Likelihoods
5.4 Learning Algorithms: Backpropagation
6 Neural Networks: Applications
6.1 Sequence Encoding and Output Interpretation
6.2 Prediction of Protein Secondary Structure
6.3 Prediction of Signal Peptides and Their Cleavage Sites
6.4 Applications for DNA and RNA Nucleotide Sequences
7 Hidden Markov Models: The Theory
7.1 Introduction
7.2 Prior Information and Initialization
7.3 Likelihood and Basic Algorithms
7.4 Learning Algorithms
7.5 Applications of HMMs: General Aspects
8 Hidden Markov Models: Applications
8.1 Protein Applications
8.2 DNA and RNA Applications
8.3 Conclusion: Advantages and Limitations of HMMs
9 Hybrid Systems: Hidden Markov Models and Neural Networks
9.1 Introduction to Hybrid Models
9.2 The Single-Model Case
9.3 The Multiple-Model Case
9.4 Simulation Results
9.5 Summary
10 Probabilistic Models of Evolution: Phylogenetic Trees
10.1 Introduction to Probabilistic Models of Evolution
10.2 Substitution Probabilities and Evolutionary Rates
10.3 Rates of Evolution
10.4 Data Likelihood
10.5 Optimal Trees and Learning
10.6 Parsimony
10.7 Extensions
11 Stochastic Grammars and Linguistics
11.1 Introduction to Formal Grammars
11.2 Formal Grammars and the Chomsky Hierarchy
11.3 Applications of Grammars to Biological Sequences
11.4 Prior Information and Initialization
11.5 Likelihood
11.6 Learning Algorithms
11.7 Applications of SCFGs
11.8 Experiments
11.9 Future Directions
12 Internet Resources and Public Databases
12.1 A Rapidly Changing Set of Resources
12.2 Databases over Databases and Tools
12.3 Databases over Databases
12.4 Databases
12.5 Sequence Similarity Searches
12.6 Alignment
12.7 Selected Prediction Servers
12.8 Molecular Biology Software Links
12.9 Ph.D. Courses over the Internet
12.10 HMM/NN Simulator
A Statistics
A.1 Decision Theory and Loss Functions
A.2 Quadratic Loss Functions
A.3 The Bias/Variance Trade-off
A.4 Combining Estimators
A.5 Error Bars
A.6 Sufficient Statistics
A.7 Exponential Family
A.8 Gaussian Process Models
A.9 Variational Methods
B Information Theory, Entropy, and Relative Entropy
B.1 Entropy
B.2 Relative Entropy
B.3 Mutual Information
B.4 Jensen's Inequality
B.5 Maximum Entropy
B.6 Minimum Relative Entropy
C Probabilistic Graphical Models
C.1 Notation and Preliminaries
C.2 The Undirected Case: Markov Random Fields
C.3 The Directed Case: Bayesian Networks
D HMM Technicalities, Scaling, Periodic Architectures, State Functions, and Dirichlet Mixtures
D.1 Scaling
D.2 Periodic Architectures
D.3 State Functions: Bendability
D.4 Dirichlet Mixtures
E List of Main Symbols and Abbreviations
References
Index