Statistical bioinformatics : a guide for life and biomedical science researchers

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Publication Information:

Hoboken, NJ : Wiley-Blackwell, c2010

Physical Description:

xiv, 350 p., [20] p. of plates : ill. (some col.) ; 24 cm.

ISBN:

9780471692720

Subject Term:

Bioinformatics -- Statistical methods

Added Author:

Lee, Jae K. (Jae Kyun)

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010236388	QH324.2 S72 2010	Open Access Book	Book	Searching... Unknown

This book provides an essential understanding of statistical concepts necessary for the analysis of genomic and proteomic data using computational techniques. The author presents both basic and advanced topics, focusing on those that are relevant to the computational analysis of large data sets in biology. Chapters begin with a description of a statistical concept and a current example from biomedical research, followed by more detailed presentation, discussion of limitations, and problems. The book starts with an introduction to probability and statistics for genome-wide data, and moves into topics such as clustering, classification, multi-dimensional visualization, experimental design, statistical resampling, and statistical network analysis. Clearly explains the use of bioinformatics tools in life sciences research without requiring an advanced background in math/statistics Enables biomedical and life sciences researchers to successfully evaluate the validity of their results and make inferences Enables statistical and quantitative researchers to rapidly learn novel statistical concepts and techniques appropriate for large biological data analysis Carefully revisits frequently used statistical approaches and highlights their limitations in large biological data analysis Offers programming examples and datasets Includes chapter problem sets, a glossary, a list of statistical notations, and appendices with references to background mathematical and technical material Features supplementary materials, including datasets, links, and a statistical package available online

Statistical Bioinformatics is an ideal textbook for students in medicine, life sciences, and bioengineering, aimed at researchers who utilize computational tools for the analysis of genomic, proteomic, and many other emerging high-throughput molecular data. It may also serve as a rapid introduction to the bioinformatics science for statistical and computational students and audiences who have not experienced such analysis tasks before.

Author Notes

Jae K. Lee , Ph.D., is a professor of biostatistics and epidemiology in the Department of Health Evaluation Sciences at the University of Virginia School of Medicine, where he designed and teaches a course on Statistical Bioinformatics in Medicine. He earned his doctorate in statistical genetics from the University of Wisconsin, Madison. He was previously a research scientist in the Laboratory of Molecular Pharmacology, National Cancer Institute. Among his current research interests is the integration of statistical and genomic information for the analysis of microarray data.

Preface	p. xi
Contributors	p. xiii
1 Road Statistical Bioinformatics	p. 1
Challenge 1 Multiple-Comparisons Issue	p. 1
Challenge 2 High-Dimensional Biological Data	p. 2
Challenge 3 Small-n and Large-p problem	p. 3
Challenge 4 Noisy High-Throughput Biological Data	p. 3
Challenge 5 Integration of multiple, Heterogeneous Biological Data Information References	p. 5
2 Probability Concepts and Distributions for analyzing Large Biological Data	p. 7
2.1 Introduction	p. 7
2.2 Basic Concepts	p. 8
2.3 Conditional Probability and Independence	p. 10
2.4 Random Variables	p. 13
2.5 Expected Value and Variance	p. 15
2.6 Distributions of Random Variable	p. 19
2.7 Joint and Marginal Distribution	p. 39
2.8 Multivariate Distribution	p. 42
2.9 Sampling Distribution	p. 46
2.10 Summary	p. 54
3 Quality Control of High-Throughput Biological Data	p. 57
3.1 Sources of Error in High-Throughput Biological Experiments	p. 57
3.2 Statistical Techniques for Quality Control	p. 59
3.3 Issues specific to Microarray Gene Expression Experiments	p. 66
3.4 Conclusion	p. 69
References	p. 69
4 Statistical Testing and Significance for Large Biological Data Analysis	p. 71
4.1 Introduction	p. 71
4.2 Statistical Testing	p. 72
4.3 Error Controlling	p. 78
4.4 Real Data Analysis	p. 81
4.5 Concluding Remarks	p. 87
Acknowledgement	p. 87
References	p. 87
5 Clustering: Unsupervised Learning in Large Biological Data	p. 89
5.1 Measure of Similarity	p. 90
5.2 Clustering	p. 99
5.3 Assessment of Cluster Quality	p. 115
5.4 Conclusion	p. 123
References	p. 123
6 Classification: Supervised Learning with High-Dimensional Biological Data	p. 129
6.1 Introduction	p. 129
6.2 Classification and Prediction Methods	p. 132
6.3 Feature Selection and Ranking	p. 140
6.4 Cross-Validation	p. 144
6.5 Enhancement of Class Prediction by Ensemble Voting Methods	p. 145
6.6 Comparison of Classification Methods Using High-Dimension Data	p. 147
6.7 Software Examples for Classification Methods	p. 150
References	p. 154
7 Multidimensional Analysis and Visualization on Large Biomedical Data	p. 157
7.1 Introduction	p. 157
7.2 Classical Multidimensional Visualization Techniques	p. 158
7.3 Two-Dimensional Projections	p. 161
7.4 Issues and Challenges	p. 165
7.5 Systematic Exploration of Low Dimensional Projections	p. 166
7.6 One-Dimensional Histogram Ordering	p. 170
7.7 Two-Dimensional Histogram Ordering	p. 174
7.8 Conclusion	p. 181
References	p. 182
8 Statistical Models, Inferences, and Algorithms for Large Biological Data Analysis	p. 185
8.1 Introduction	p. 185
8.2 Statistical/Problematic Models	p. 187
8.3 Estimation Methods	p. 189
8.4 Numerical Algorithms	p. 191
8.5 Examples	p. 192
8.9 Conclusion	p. 198
References	p. 199
9 Expoerimental Designs on High-Throughput Biological Experiments	p. 201
9.1 Randomization	p. 201
9.2 Replication	p. 202
9.3 Pooling	p. 209
9.4 Blocking	p. 210
9.5 Design for Classifications	p. 214
9.6 Design for Time Course Experiments	p. 215
9.7 Design for eQTL Studies	p. 215
Reference	p. 216
10 Statistical Resampling Techniques for Large Biological Data Analysis	p. 219
10.1 Introduction	p. 219
10.2 Resampling Methods for Prediction Error Assessment and Model Selection	p. 221
10.3 Feature Selection	p. 225
10.4 Resampling-Based Classification Algorithms	p. 226
10.5 Practical Example: Lymphoma	p. 226
10.6 Resampling Methods	p. 227
10.7 Bootstrap Methods	p. 232
10.8 Sample Size Issues	p. 233
10.9 Loss Functions	p. 235
10.10 Bootstrap Resampling for Quantifying Uncertainty	p. 236
10.11 Markov Chain Monte Carlo Methods	p. 238
10.12 Conclusion	p. 240
References	p. 247
11 Statistical Network Analysis for Biological Systems and Pathways	p. 249
11.1 Introduction	p. 249
11.2 Boolean Network Modeling	p. 250
11.3 Bayesian Belief Network	p. 259
11.4 Modeling of Metabolic Networks	p. 273
References	p. 279
12 Trends and Statistical Challenges in Genomewide Association Studies	p. 283
12.1 Introduction	p. 283
12.2 Alles, Linkage Disequilibrium, and Haplotype	p. 283
12.3 International Hap Map Project	p. 285
12.4 Genotyping Platforms	p. 286
12.5 Overview of Current GWAS Results	p. 287
12.6 Statistical Issues in GWAS	p. 290
12.7 Haplotype Analysis	p. 296
12.8 Homozygosity and Admixture Mapping	p. 298
12.9 Gene x Gene and Gene x Environmental Interactions	p. 298
12.10 Gene and Pathway-Based Analysis	p. 299
12.11 Disease Risk Estimates	p. 301
12.12 Meta-Analysis	p. 301
12.13 Rare Variants and Sequence-Based Analysis	p. 302
12.14 Conclusions	p. 303
Acknowledgment	p. 303
References	p. 303
13 Rand Bioconductor Packages in Bioinformatics: Towards System Biology	p. 309
13.1 Introduction	p. 309
13.2 Brief Overview of the Bioconductor Project	p. 310
13.3 Experimental Data	p. 311
13.4 Annotation	p. 318
13.5 Models of Biological Sytems	p. 328
13.6 Conclusion	p. 335
13.7 Acknowledgment	p. 336
Refernces	p. 336
Index	p. 339

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents