Bioinformatics and computational biology solutions using R and bioconductor

Title:

Series:

Statistics for biology and health

Publication Information:

New York : Springer, 2005

ISBN:

9780387251462

Subject Term:

Bioinformatics

Computational biology

R (Computer program language)

Added Author:

Gentleman, Robert

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010071206	QH324.2 B564 2005	Open Access Book	Book	Searching... Unknown

Bioconductor is a widely used open source and open development software project for the analysis and comprehension of data arising from high-throughput experimentation in genomics and molecular biology. Bioconductor is rooted in the open source statistical computing environment R.

This volume's coverage is broad and ranges across most of the key capabilities of the Bioconductor project, including importation and preprocessing of high-throughput data from microarray, proteomic, and flow cytometry platforms:

Curation and delivery of biological metadata for use in statistical modeling and interpretation

Statistical analysis of high-throughput data, including machine learning and visualization

Modeling and visualization of graphs and networks

The developers of the software, who are in many cases leading academic researchers, jointly authored chapters. All methods are illustrated with publicly available data, and a major section of the book is devoted to exposition of fully worked case studies.

This book is more than a static collection of descriptive text, figures, and code examples that were run by the authors to produce the text; it is a dynamic document. Code underlying all of the computations that are shown is made available on a companion website, and readers can reproduce every number, figure, and table on their own computers.

W. Huber and R. A. Irizarry and R. GentlemanB.M. Bolstad and R.A. Irizarry and L. Gautier and Z. WuB.M. Bolstad and F. Collin and J. Brettschneider and K. Simpson and L. Cope and R.A. Irizarry and T.P. SpeedY.H. Yang and A.C. PaquetW. Huber and F. HahneX. Li and R. Gentleman and X. Lu and Q. Shi and J.D. Iglehart and L. Harris and A. MironR. Gentleman and V.J. Carey and J. ZhangV. J. Carey and D. Temple Lang and J. Gentry and J. Zhang and R. GentlemanC. A. Smith and W. Huber and R. GentlemanW. Huber and X. Li and R. GentlemanV. J. Carey and R. GentlemanR. Gentleman and B. Ding and S. Dudoit and J. IbrahimK. S. Pollard and M. J. van der LaanD. Scholtens and A. von HeydebreckK. S. Pollard and S. Dudoit and M. J. van der LaanV. J. CareyT. Hothorn and M. Dettling and P. BuhlmannC. A. SmithR. Gentleman and W. Huber and V. J. CareyW. Huber and R. Gentleman and V. J. CareyV. J. Carey and R. Gentleman and W. Huber and J. GentryR. Gentleman and D. Scholtens and B. Ding and V. J. Carey and W. HuberG. K. SmythM. DettlingR. A. Irizarry

I Preprocessing data from genomic experiments	p. 1
1 Preprocessing Overview	p. 3
1.1 Introduction	p. 3
1.2 Tasks	p. 4
1.3 Data structures	p. 6
1.4 Statistical background	p. 8
1.5 Conclusion	p. 12
2 Preprocessing High-density Oligonucleotide Arrays	p. 13
2.1 Introduction	p. 13
2.2 Importing and accessing probe-level data	p. 15
2.3 Background adjustment and normalization	p. 18
2.4 Summarization	p. 25
2.5 Assessing preprocessing methods	p. 29
2.6 Conclusion	p. 32
3 Quality Assessment of Affymetrix GeneChip Data	p. 33
3.1 Introduction	p. 33
3.2 Exploratory data analysis	p. 34
3.3 Affymetrix quality assessment metrics	p. 37
3.4 RNA degradation	p. 38
3.5 Probe level models	p. 41
3.6 Conclusion	p. 47
4 Preprocessing Two-Color Spotted Arrays	p. 49
4.1 Introduction	p. 49
4.2 Two-color spotted microarrays	p. 50
4.3 Importing and accessing probe-level data	p. 51
4.4 Quality assessment	p. 57
4.5 Normalization	p. 62
4.6 Case study	p. 67
5 Cell-Based Assays	p. 71
5.1 Scope	p. 71
5.2 Experimental technologies	p. 71
5.3 Reading data	p. 73
5.4 Quality assessment and visualization	p. 79
5.5 Detection of effectors	p. 85
6 SELDI-TOF Mass Spectrometry Protein Data	p. 91
6.1 Introduction	p. 91
6.2 Baseline subtraction	p. 93
6.3 Peak detection	p. 95
6.4 Processing a set of calibration spectra	p. 96
6.5 An example	p. 105
6.6 Conclusion	p. 108
II Meta-data: biological annotation and visualization	p. 111
7 Meta-data Resources and Tools in Bioconductor	p. 113
7.1 Introduction	p. 113
7.2 External annotation resources	p. 115
7.3 Bioconductor annotation concepts: curated persistent packages and Web services	p. 116
7.4 The annotate package	p. 119
7.5 Software tools for working with Gene Ontology (GO)	p. 120
7.6 Pathway annotation packages: KEGG and cMAP	p. 125
7.7 Cross-organism annotation: the homology packages	p. 130
7.8 Annotation from other sources	p. 132
7.9 Discussion	p. 133
8 Querying On-line Resources	p. 135
8.1 The Tools	p. 135
8.2 PubMed	p. 138
8.3 KEGG via SOAP	p. 142
8.4 Getting gene sequence information	p. 144
8.5 Conclusion	p. 145
9 Interactive Outputs	p. 147
9.1 Introduction	p. 147
9.2 A simple approach	p. 148
9.3 Using the annaffy package	p. 149
9.4 Linking to On-line Databases	p. 152
9.5 Building HTML pages	p. 153
9.6 Graphical displays with drill-down functionality	p. 156
9.7 Searching Meta-data	p. 159
9.8 Concluding Remarks	p. 160
10 Visualizing Data	p. 161
10.1 Introduction	p. 161
10.2 Practicalities	p. 162
10.3 High-volume scatterplots	p. 163
10.4 Heatmaps	p. 166
10.5 Visualizing distances	p. 170
10.6 Plotting along genomic coordinates	p. 174
10.7 Conclusion	p. 179
III Statistical analysis for genomic experiments	p. 181
11 Analysis Overview	p. 183
11.1 Introduction and road map	p. 183
11.2 Absolute and relative expression measures	p. 185
12 Distance Measures in DNA Microarray Data Analysis	p. 189
12.1 Introduction	p. 189
12.2 Distances	p. 191
12.3 Microarray data	p. 199
12.4 Examples	p. 201
12.5 Discussion	p. 208
13 Cluster Analysis of Genomic Data	p. 209
13.1 Introduction	p. 209
13.2 Methods	p. 210
13.3 Application: renal cell cancer	p. 222
13.4 Conclusion	p. 228
14 Analysis of Differential Gene Expression Studies	p. 229
14.1 Introduction	p. 229
14.2 Differential expression analysis	p. 230
14.3 Multifactor experiments	p. 239
14.4 Conclusion	p. 248
15 Multiple Testing Procedures: the multtest Package and Applications to Genomics	p. 249
15.1 Introduction	p. 249
15.2 Multiple hypothesis testing methodology	p. 250
15.3 Software implementation: R multtest package	p. 259
15.4 Applications: ALL microarray data set	p. 262
15.5 Discussion	p. 270
16 Machine Learning Concepts and Tools for Statistical Genomics	p. 273
16.1 Introduction	p. 273
16.2 Illustration: Two continuous features; decision regions	p. 274
16.3 Methodological issues	p. 276
16.4 Applications	p. 285
16.5 Conclusions	p. 291
17 Ensemble Methods of Computational Inference	p. 293
17.1 Introduction	p. 293
17.2 Bagging and random forests	p. 295
17.3 Boosting	p. 296
17.4 Multiclass problems	p. 298
17.5 Evaluation	p. 298
17.6 Applications: tumor prediction	p. 300
17.7 Applications: Survival analysis	p. 307
17.8 Conclusion	p. 310
18 Browser-based Affymetrix Analysis and Annotation	p. 313
18.1 Introduction	p. 313
18.2 Deploying webbioc	p. 315
18.3 Using webbioc	p. 317
18.4 Extending webbioc	p. 322
18.5 Conclusion	p. 326
IV Graphs and networks	p. 327
19 Introduction and Motivating Examples	p. 329
19.1 Introduction	p. 329
19.2 Practicalities	p. 330
19.3 Motivating examples	p. 331
19.4 Discussion	p. 336
20 Graphs	p. 337
20.1 Overview	p. 337
20.2 Definitions	p. 338
20.3 Cohesive subgroups	p. 344
20.4 Distances	p. 346
21 Bioconductor Software for Graphs	p. 347
21.1 Introduction	p. 347
21.2 The graph package	p. 348
21.3 The RBGL package	p. 352
21.4 Drawing graphs	p. 360
22 Case Studies Using Graphs on Biological Data	p. 369
22.1 Introduction	p. 369
22.2 Comparing the transcriptome and the interactome	p. 370
22.3 Using GO	p. 374
22.4 Literature co-citation	p. 378
22.5 Pathways	p. 387
22.6 Concluding remarks	p. 393
V Case studies	p. 395
23 limma: Linear Models for Microarray Data	p. 397
23.1 Introduction	p. 397
23.2 Data representations	p. 398
23.3 Linear models	p. 399
23.4 Simple comparisons	p. 400
23.5 Technical Replication	p. 403
23.6 Within-array replicate spots	p. 406
23.7 Two groups	p. 407
23.8 Several groups	p. 409
23.9 Direct two-color designs	p. 411
23.10 Factorial designs	p. 412
23.11 Time course experiments	p. 414
23.12 Statistics for differential expression	p. 415
23.13 Fitted model objects	p. 417
23.14 Preprocessing considerations	p. 418
23.15 Conclusion	p. 420
24 Classification with Gene Expression Data	p. 421
24.1 Introduction	p. 421
24.2 Reading and customizing the data	p. 422
24.3 Training and validating classifiers	p. 423
24.4 Multiple random divisions	p. 426
24.5 Classification of test data	p. 428
24.6 Conclusion	p. 429
25 From CEL Files to Annotated Lists of Interesting Genes	p. 431
25.1 Introduction	p. 431
25.2 Reading CEL files	p. 432
25.3 Preprocessing	p. 432
25.4 Ranking and filtering genes	p. 433
25.5 Annotation	p. 438
25.6 Conclusion	p. 442
A Details on selected resources	p. 443
A.1 Data sets	p. 443
A.1.1 ALL	p. 443
A.1.2 Renal cell cancer	p. 443
A.1.3 Estrogen receptor stimulation	p. 443
A.2 URLs for projects mentioned	p. 444
References	p. 445
Index	p. 465

Available:*

On Order

Summary

Summary

Table of Contents