Cover image for Machine learning approaches to bioinformatics
Title:
Machine learning approaches to bioinformatics
Personal Author:
Series:
Science, engineering, and biology informatics ; 4
Publication Information:
Hackensack, NJ : World Scientific Publishing, 2010
Physical Description:
xiv, 322 p. : ill. ; 24 cm.
ISBN:
9789814287302

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010270369 QH324.25 Y26 2010 Open Access Book Book
Searching...

On Order

Summary

Summary

This book covers a wide range of subjects in applying machine learning approaches for bioinformatics projects. The book succeeds on two key unique features. First, it introduces the most widely used machine learning approaches in bioinformatics and discusses, with evaluations from real case studies, how they are used in individual bioinformatics projects. Second, it introduces state-of-the-art bioinformatics research methods. The theoretical parts and the practical parts are well integrated for readers to follow the existing procedures in individual research. Unlike most of the bioinformatics books on the market, the content coverage is not limited to just one subject. A broad spectrum of relevant topics in bioinformatics including systematic data mining and computational systems biology researches are brought together in this book, thereby offering an efficient and convenient platform for teaching purposes. An essential reference for both final year undergraduates and graduate students in universities, as well as a comprehensive handbook for new researchers, this book will also serve as a practical guide for software development in relevant bioinformatics projects.


Table of Contents

Prefacep. v
1 Introductionp. 1
1.1 Brief history of bioinformaticsp. 3
1.2 Database application in bioinformaticsp. 6
1.3 Web tools and services for sequence homology Alignmentp. 8
1.3.1 Web tools and services for protein functional site identificationp. 9
1.3.2 Web tools and services for other biological datap. 10
1.4 Pattern analysisp. 10
1.5 The contribution of information technologyp. 11
1.6 Chaptersp. 12
2 Introduction to Unsupervised Learningp. 15
3 Probability Density Estimation Approachesp. 24
3.1 Histogram approachp. 24
3.2 Parametric approachp. 25
3.3 Non-parametric approachp. 28
3.3.1 K-nearest neighbour approachp. 28
3.3.2 Kernel approachp. 29
Summaryp. 36
4 Dimension Reductionp. 38
4.1 Generalp. 38
4.2 Principal component analysisp. 39
4.3 An application of PCAp. 42
4.4 Multi-dimensional scalingp. 46
4.5 Application of the Sammon algorithm to gene datap. 48
Summaryp. 50
5 Cluster Analysisp. 52
5.1 Hierarchical clusteringp. 52
5.2 K-meansp. 55
5.3 Fuzzy C-meansp. 58
5.4 Gaussian mixture modelsp. 60
5.5 Application of clustering algorithms to the Burkholderia pseudomallei gene expression datap. 64
Summaryp. 67
6 Self-organising Mapp. 69
6.1 Vector quantizationp. 69
6.2 SOM structurep. 73
6.3 SOM learning algorithmp. 75
6.4 Using SOM for classificationp. 79
6.5 Bioinformatics applications of VQ and SOMp. 81
6.5.1 Sequence analysisp. 81
6.5.2 Gene expression data analysisp. 83
6.5.3 Metabolite data analysisp. 86
6.6 A case study of gene expression data analysisp. 86
6.7 A case study of sequence data analysisp. 88
Summaryp. 90
7 Introduction to Supervised Learningp. 92
7.1 General conceptsp. 92
7.2 General Definitionp. 94
7.3 Model evaluationp. 96
7.4 Data organisationp. 101
7.5 Bayes rule for classificationp. 103
Summaryp. 103
8 Linear/Quadratic Discriminant Analysis and K-nearest Neighbourp. 104
8.1 Linear discriminant analysisp. 104
8.2 Generalised discriminant analysisp. 109
8.3 K-nearest neighbourp. 111
8.4 KNN for gene data analysisp. 118
Summaryp. 118
9 Classification and Regression Trees, Random Forest Algorithmp. 120
9.1 Introductionp. 120
9.2 Basic principle for constructing a classification treep. 121
9.3 Classification and regression treep. 125
9.4 CART for compound pathway involvement predictionp. 126
9.5 The random forest algorithmp. 128
9.6 RF for analyzing Burkholderia pseudomallei gene expression profilesp. 129
Summaryp. 132
10 Multi-layer Perceptronp. 133
10.1 Introductionp. 133
10.2 Learning theoryp. 137
10.2.1 Parameterization of a neural networkp. 137
10.2.2 Learning rulesp. 137
10.3 Learning algorithmsp. 145
10.3.1 Regressionp. 145
10.3.2 Classificationp. 146
10.3.3 Procedurep. 147
10.4 Applications to bioinformaticsp. 148
10.4.1 Bio-chemical data analysisp. 148
10.4.2 Gene expression data analysisp. 149
10.4.3 Protein structure data analysisp. 149
10.4.4 Bio-marker identificationp. 150
10.5 A case study on Burkholderia pseudomallei gene expression datap. 150
Summaryp. 153
11 Basis Function Approach and Vector Machinesp. 154
11.1 Introductionp. 154
11.2 Radial-basis function neural network (RBFNN)p. 156
11.3 Bio-basis function neural networkp. 162
11.4 Support vector machinep. 168
11.5 Relevance vector machinep. 173
Summaryp. 176
12 Hidden Markov Modelp. 177
12.1 Markov modelp. 177
12.2 Hidden Markov modelp. 179
12.2.1 General definitionp. 179
12.2.2 Handling HMMp. 183
12.2.3 Evaluationp. 184
12.2.4 Decodingp. 188
12.2.5 Learningp. 189
12.3 HMM for sequence classificationp. 191
Summaryp. 194
13 Feature Selectionp. 195
13.1 Built-in strategyp. 195
13.1.1 Lasso regressionp. 196
13.1.2 Ridge regressionp. 199
13.1.3 Partial least square regression (PLS) algorithmp. 200
13.2 Exhaustive strategyp. 204
13.3 Heuristic strategy - orthogonal least square approachp. 204
13.4 Criteria for feature selectionp. 208
13.4.1 Correlation measurep. 209
13.4.2 Fisher ratio measurep. 210
13.4.3 Mutual information approachp. 210
Summaryp. 212
14 Feature Extraction (Biological Data Coding)p. 213
14.1 Molecular sequencesp. 214
14.2 Chemical compoundsp. 215
14.3 General definitionp. 216
14.4 Sequence analysisp. 216
14.4.1 Peptide feature extractionp. 216
14.4.2 Whole sequence feature extractionp. 222
Summaryp. 224
15 Sequence/Structural Bioinformatics Foundation - Peptide Classificationp. 225
15.1 Nitration site predictionp. 225
15.2 Plant promoter region predictionp. 230
Summaryp. 237
16 Gene Network - Causal Network and Bayesian Networksp. 238
16.1 Gene regulatory networkp. 238
16.2 Causal networks, networks, graphsp. 241
16.3 A brief review of the probabilityp. 242
16.4 Discrete Bayesian networkp. 245
16.5 Inference with discrete Bayesian networkp. 246
16.6 Learning discrete Bayesian networkp. 247
16.7 Bayesian networks for gene regulatory networksp. 247
16.8 Bayesian networks for discovering Peptide patternsp. 248
16.9 Bayesian networks for analysing Burkholderia pseudomallei gene datap. 249
Summaryp. 252
17 S-Systemsp. 253
17.1 Michealis-Menten change lawp. 253
17.2 S-Systemp. 256
17.3 Simplification of an S-systemp. 259
17.4 Approaches for structure identification and parameter estimationp. 260
17.4.1 Neural network approachp. 260
17.4.2 Simulated annealing approachp. 261
17.4.3 Evolutionary computation approachp. 262
17.5 Steady-state analysis of an S-systemp. 262
17.6 Sensitivity of an S-systemp. 267
Summaryp. 268
18 Future Directionsp. 269
18.1 Multi-source datap. 270
18.2 Gene regulatory network constructionp. 272
18.3 Building models using incomplete datap. 274
18.4 Biomarker detection from gene expression datap. 275
Summaryp. 278
Referencesp. 279
Indexp. 319