Cover image for Statistical bioinformatics : a guide for life and biomedical science researchers
Title:
Statistical bioinformatics : a guide for life and biomedical science researchers
Publication Information:
Hoboken, NJ : Wiley-Blackwell, c2010
Physical Description:
xiv, 350 p., [20] p. of plates : ill. (some col.) ; 24 cm.
ISBN:
9780471692720
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010236388 QH324.2 S72 2010 Open Access Book Book
Searching...

On Order

Summary

Summary

This book provides an essential understanding of statistical concepts necessary for the analysis of genomic and proteomic data using computational techniques. The author presents both basic and advanced topics, focusing on those that are relevant to the computational analysis of large data sets in biology. Chapters begin with a description of a statistical concept and a current example from biomedical research, followed by more detailed presentation, discussion of limitations, and problems. The book starts with an introduction to probability and statistics for genome-wide data, and moves into topics such as clustering, classification, multi-dimensional visualization, experimental design, statistical resampling, and statistical network analysis. Clearly explains the use of bioinformatics tools in life sciences research without requiring an advanced background in math/statistics Enables biomedical and life sciences researchers to successfully evaluate the validity of their results and make inferences Enables statistical and quantitative researchers to rapidly learn novel statistical concepts and techniques appropriate for large biological data analysis Carefully revisits frequently used statistical approaches and highlights their limitations in large biological data analysis Offers programming examples and datasets Includes chapter problem sets, a glossary, a list of statistical notations, and appendices with references to background mathematical and technical material Features supplementary materials, including datasets, links, and a statistical package available online

Statistical Bioinformatics is an ideal textbook for students in medicine, life sciences, and bioengineering, aimed at researchers who utilize computational tools for the analysis of genomic, proteomic, and many other emerging high-throughput molecular data. It may also serve as a rapid introduction to the bioinformatics science for statistical and computational students and audiences who have not experienced such analysis tasks before.


Author Notes

Jae K. Lee , Ph.D., is a professor of biostatistics and epidemiology in the Department of Health Evaluation Sciences at the University of Virginia School of Medicine, where he designed and teaches a course on Statistical Bioinformatics in Medicine. He earned his doctorate in statistical genetics from the University of Wisconsin, Madison. He was previously a research scientist in the Laboratory of Molecular Pharmacology, National Cancer Institute. Among his current research interests is the integration of statistical and genomic information for the analysis of microarray data.


Table of Contents

Prefacep. xi
Contributorsp. xiii
1 Road Statistical Bioinformaticsp. 1
Challenge 1 Multiple-Comparisons Issuep. 1
Challenge 2 High-Dimensional Biological Datap. 2
Challenge 3 Small-n and Large-p problemp. 3
Challenge 4 Noisy High-Throughput Biological Datap. 3
Challenge 5 Integration of multiple, Heterogeneous Biological Data Information Referencesp. 5
2 Probability Concepts and Distributions for analyzing Large Biological Datap. 7
2.1 Introductionp. 7
2.2 Basic Conceptsp. 8
2.3 Conditional Probability and Independencep. 10
2.4 Random Variablesp. 13
2.5 Expected Value and Variancep. 15
2.6 Distributions of Random Variablep. 19
2.7 Joint and Marginal Distributionp. 39
2.8 Multivariate Distributionp. 42
2.9 Sampling Distributionp. 46
2.10 Summaryp. 54
3 Quality Control of High-Throughput Biological Datap. 57
3.1 Sources of Error in High-Throughput Biological Experimentsp. 57
3.2 Statistical Techniques for Quality Controlp. 59
3.3 Issues specific to Microarray Gene Expression Experimentsp. 66
3.4 Conclusionp. 69
Referencesp. 69
4 Statistical Testing and Significance for Large Biological Data Analysisp. 71
4.1 Introductionp. 71
4.2 Statistical Testingp. 72
4.3 Error Controllingp. 78
4.4 Real Data Analysisp. 81
4.5 Concluding Remarksp. 87
Acknowledgementp. 87
Referencesp. 87
5 Clustering: Unsupervised Learning in Large Biological Datap. 89
5.1 Measure of Similarityp. 90
5.2 Clusteringp. 99
5.3 Assessment of Cluster Qualityp. 115
5.4 Conclusionp. 123
Referencesp. 123
6 Classification: Supervised Learning with High-Dimensional Biological Datap. 129
6.1 Introductionp. 129
6.2 Classification and Prediction Methodsp. 132
6.3 Feature Selection and Rankingp. 140
6.4 Cross-Validationp. 144
6.5 Enhancement of Class Prediction by Ensemble Voting Methodsp. 145
6.6 Comparison of Classification Methods Using High-Dimension Datap. 147
6.7 Software Examples for Classification Methodsp. 150
Referencesp. 154
7 Multidimensional Analysis and Visualization on Large Biomedical Datap. 157
7.1 Introductionp. 157
7.2 Classical Multidimensional Visualization Techniquesp. 158
7.3 Two-Dimensional Projectionsp. 161
7.4 Issues and Challengesp. 165
7.5 Systematic Exploration of Low Dimensional Projectionsp. 166
7.6 One-Dimensional Histogram Orderingp. 170
7.7 Two-Dimensional Histogram Orderingp. 174
7.8 Conclusionp. 181
Referencesp. 182
8 Statistical Models, Inferences, and Algorithms for Large Biological Data Analysisp. 185
8.1 Introductionp. 185
8.2 Statistical/Problematic Modelsp. 187
8.3 Estimation Methodsp. 189
8.4 Numerical Algorithmsp. 191
8.5 Examplesp. 192
8.9 Conclusionp. 198
Referencesp. 199
9 Expoerimental Designs on High-Throughput Biological Experimentsp. 201
9.1 Randomizationp. 201
9.2 Replicationp. 202
9.3 Poolingp. 209
9.4 Blockingp. 210
9.5 Design for Classificationsp. 214
9.6 Design for Time Course Experimentsp. 215
9.7 Design for eQTL Studiesp. 215
Referencep. 216
10 Statistical Resampling Techniques for Large Biological Data Analysisp. 219
10.1 Introductionp. 219
10.2 Resampling Methods for Prediction Error Assessment and Model Selectionp. 221
10.3 Feature Selectionp. 225
10.4 Resampling-Based Classification Algorithmsp. 226
10.5 Practical Example: Lymphomap. 226
10.6 Resampling Methodsp. 227
10.7 Bootstrap Methodsp. 232
10.8 Sample Size Issuesp. 233
10.9 Loss Functionsp. 235
10.10 Bootstrap Resampling for Quantifying Uncertaintyp. 236
10.11 Markov Chain Monte Carlo Methodsp. 238
10.12 Conclusionp. 240
Referencesp. 247
11 Statistical Network Analysis for Biological Systems and Pathwaysp. 249
11.1 Introductionp. 249
11.2 Boolean Network Modelingp. 250
11.3 Bayesian Belief Networkp. 259
11.4 Modeling of Metabolic Networksp. 273
Referencesp. 279
12 Trends and Statistical Challenges in Genomewide Association Studiesp. 283
12.1 Introductionp. 283
12.2 Alles, Linkage Disequilibrium, and Haplotypep. 283
12.3 International Hap Map Projectp. 285
12.4 Genotyping Platformsp. 286
12.5 Overview of Current GWAS Resultsp. 287
12.6 Statistical Issues in GWASp. 290
12.7 Haplotype Analysisp. 296
12.8 Homozygosity and Admixture Mappingp. 298
12.9 Gene x Gene and Gene x Environmental Interactionsp. 298
12.10 Gene and Pathway-Based Analysisp. 299
12.11 Disease Risk Estimatesp. 301
12.12 Meta-Analysisp. 301
12.13 Rare Variants and Sequence-Based Analysisp. 302
12.14 Conclusionsp. 303
Acknowledgmentp. 303
Referencesp. 303
13 Rand Bioconductor Packages in Bioinformatics: Towards System Biologyp. 309
13.1 Introductionp. 309
13.2 Brief Overview of the Bioconductor Projectp. 310
13.3 Experimental Datap. 311
13.4 Annotationp. 318
13.5 Models of Biological Sytemsp. 328
13.6 Conclusionp. 335
13.7 Acknowledgmentp. 336
Referncesp. 336
Indexp. 339