Cover image for The analysis of gene expression data : methods and software
Title:
The analysis of gene expression data : methods and software
Series:
Statistics for biology and health
Publication Information:
New York, NY : Springer, 2003
ISBN:
9780387955773
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010119440 QP624.5.D726 A52 2003 Open Access Book Book
Searching...

On Order

Summary

Summary

Thedevelopmentoftechnologiesforhigh-throughputmeasurementofgene expression in biological system is providing powerful new tools for inv- tigating the transcriptome on a genomic scale, and across diverse biol- ical systems and experimental designs. This technological transformation is generating an increasing demand for data analysis in biological inv- tigations of gene expression. This book focuses on data analysis of gene expression microarrays. The goal is to provide guidance to practitioners in deciding which statistical approaches and packages may be indicated for their projects, in choosing among the various options provided by those packages, and in correctly interpreting the results. The book is a collection of chapters written by authors of statistical so- ware for microarray data analysis. Each chapter describes the conceptual and methodological underpinning of data analysis tools as well as their software implementation, and will enable readers to both understand and implement an analysis approach. Methods touch on all aspects of statis- cal analysis of microarrays, from annotation and ?ltering to clustering and classi?cation. All software packages described are free to academic users. The materials presented cover a range of software tools designed for varied audiences. Some chapters describe simple menu-driven software in a user-friendly fashion and are designed to be accessible to microarray data analystswithoutformalquantitativetraining.Mostchaptersaredirectedat microarray data analysts with master's-level training in computer science, biostatistics, or bioinformatics. A minority of more advanced chapters are intended for doctoral students and researchers.


Table of Contents

Giovanni Parmigiani and Elizabeth S. Garrett and Rafael A. Irizarry and Scott L. ZegerRobert Gentleman and Vincent CareySandrine Dudoit and Jean Yee Hwa YangRafael A. Irizarry and Laurent Gautier and Leslie M. CopeCheng Li and Wing Hung WongJaak Vilo and Misha Kapushesky and Patrick Kemmeren and Ugis Sarkans and Alvis BrazmaJae K. Lee and Michael O'ConnellChristopher M.L.S. Bouton and George Henry and Carlo Colantuoni and Jonathan PevsnerCarlo Colantuoni and George Henry and Christopher M.L.S. Bouton and Scott L. Zeger and Jonathan PevsnerPeter F. Lemkin and Gregory C. Thornwall and Jai EvansMichael A. Newton and Christina KendziorskiJohn D. Storey and Robert TibshiraniYi Lin and Samuel T. Nadler and Hong Lan and Alan D. Attie and Brian S. YandellHao Wu and M. Kathleen Kerr and Xiangqin Cui and Gary A. ChurchillKim-Anh Do and Bradley Broom and Sijin WenElizabeth S. Garrett and Giovanni ParmigianiMichael F. OchsPaola Sebastiani and Marco Ramoni and Isaac S. KohaneAtul J. Butte and Isaac S. Kohane
Prefacep. v
Contributorsp. xvii
Color Insert
1 The Analysis of Gene Expression Data: An Overview of Methods and Softwarep. 1
1.1 Measuring Gene Expression Using Microarraysp. 1
1.1.1 Microarray Technologiesp. 1
1.1.2 Sources of Variation in Gene Expression Measurements Using Microarraysp. 4
1.1.3 Phases of Microarray Data Analysisp. 5
1.2 Design of Microarray Experimentsp. 7
1.2.1 Replication and Sample Size Considerationsp. 7
1.2.2 Design of Two-Channel Arraysp. 9
1.3 Data Storagep. 9
1.3.1 Databasesp. 9
1.3.2 Standardsp. 10
1.3.3 Statistical Analysis Languagesp. 11
1.4 Preprocessingp. 12
1.4.1 Image Analysisp. 12
1.4.2 Visualizations for Quality Controlp. 12
1.4.3 Background Subtractionp. 13
1.4.4 Probe-level Analysis of Oligonucleotide Arraysp. 14
1.4.5 Within-Array Normalization of cDNA Arraysp. 15
1.4.6 Normalization Across Arraysp. 15
1.5 Screening for Differentially Expressed Genesp. 16
1.5.1 Estimation or Selection?p. 16
1.5.2 One Problem or Many?p. 17
1.5.3 Selection and False Discovery Ratesp. 18
1.5.4 Beyond Two Groupsp. 19
1.6 Challenges of Genome Biometry Analysesp. 19
1.7 Visualization and Unsupervised Analysesp. 21
1.7.1 Profile Visualizationp. 21
1.7.2 Why Clustering?p. 22
1.7.3 Hierarchical Clusteringp. 23
1.7.4 k-Means Clustering and Self-Organizing Mapsp. 25
1.7.5 Model-Based Clusteringp. 26
1.7.6 Principal Components Analysisp. 26
1.7.7 Multidimensional Scalingp. 27
1.7.8 Identifying Novel Molecular Subclassesp. 27
1.7.9 Time Series Analysisp. 28
1.8 Predictionp. 29
1.8.1 Prediction Toolsp. 29
1.8.2 Dimension Reductionp. 30
1.8.3 Evaluation of Classifiersp. 30
1.8.4 Regression-Based Approachesp. 31
1.8.5 Classification Treesp. 31
1.8.6 Probabilistic Model-Based Classificationp. 32
1.8.7 Discriminant Analysisp. 33
1.8.8 Nearest-Neighbor Classifiersp. 33
1.8.9 Support Vector Machinesp. 33
1.9 Free and Open-Source Softwarep. 33
1.9.1 Whitehead Institute Toolsp. 34
1.9.2 Eisen Lab Toolsp. 34
1.9.3 TIGR Toolsp. 34
1.9.4 GeneX and CyberTp. 35
1.9.5 Projects and NCBIp. 35
1.9.6 BRBp. 35
1.9.7 The OOML libraryp. 36
1.9.8 MatArrayp. 36
1.9.9 BASEp. 36
1.10 Conclusionp. 36
2 Visualization and Annotation of Genomic Experimentsp. 46
2.1 Introductionp. 46
2.2 Motivations for Component-Based Softwarep. 47
2.3 Formalismp. 49
2.4 Bioconductor Software for Filtering, Exploring, and Interpreting Microarray Experimentsp. 50
2.4.1 Formal Data Structures and Methods for Multiple Microarraysp. 50
2.4.2 Tools for Filtering Gene Expression Data: The Closure Conceptp. 54
2.4.3 Expression Density Diagnostics: High-Throughput Exploratory Data Analysis for Microarraysp. 55
2.4.4 Annotationp. 57
2.5 Visualizationp. 58
2.5.1 Chromosomesp. 59
2.6 Applicationsp. 64
2.6.1 A Case Study of Gene Filteringp. 64
2.6.2 Application of Expression Density Diagnosticsp. 67
2.7 Conclusionsp. 70
3 Bioconductor R Packages for Exploratory Analysis and Normalization of cDNA Microarray Datap. 73
3.1 Introductionp. 73
3.1.1 Overview of Packagesp. 73
3.1.2 Two-Color cDNA Microarray Experimentsp. 75
3.2 Methodsp. 76
3.2.1 Standards for Microarray Datap. 76
3.2.2 Object-Oriented Programming: Microarray Classes and Methodsp. 77
3.2.3 Diagnostic Plotsp. 78
3.2.4 Normalization Using Robust Local Regressionp. 79
3.3 Application: Swirl Microarray Experimentp. 80
3.4 Softwarep. 81
3.4.1 Package marrayClasses--Classes and Methods for cDNA Microarray Datap. 81
3.4.2 Package marrayInput--Data Input for cDNA Microarraysp. 89
3.4.3 Package marrayPlots--Diagnostic Plots for cDNA Microarray Datap. 91
3.4.4 Package marrayNorm--Location and Scale Normalization for cDNA Microarray Datap. 96
3.5 Discussionp. 99
4 An R Package for Analyses of Affymetrix Oligonucleotide Arraysp. 102
4.1 Introductionp. 102
4.2 Methodsp. 103
4.2.1 Notationp. 103
4.2.2 The CEL/CDF Conventionp. 104
4.2.3 Probe Pair Setsp. 106
4.2.4 Probe-Level Objectsp. 107
4.2.5 Normalizationp. 108
4.2.6 Exploratory Data Analysis of Probe-Level Datap. 111
4.3 Applicationp. 113
4.3.1 Expression Measuresp. 113
4.4 Softwarep. 115
4.4.1 A Case Studyp. 115
4.4.2 Extending the Packagep. 118
4.5 Conclusionp. 118
5 DNA-Chip Analyzer (dChip)p. 120
5.1 Introductionp. 120
5.2 Methodsp. 121
5.2.1 Normalization of Arrays Based on an "Invariant Set"p. 121
5.2.2 Model-Based Analysis of Oligonucleotide Arraysp. 122
5.2.3 Confidence Interval for Fold Changep. 122
5.2.4 Pooling Replicate Arrays Considering Measurement Accuracyp. 124
5.3 Software and Applicationsp. 125
5.3.1 Reading in Array Data Filesp. 125
5.3.2 Viewing an Array Imagep. 127
5.3.3 Normalizing Arraysp. 129
5.3.4 Viewing PM/MM Datap. 129
5.3.5 Calculating Model-Based Expression Indexesp. 131
5.3.6 Filter Genesp. 132
5.3.7 Hierarchical Clusteringp. 133
5.3.8 Comparing Samplesp. 135
5.3.9 Mapping Genes to Chromosomesp. 137
5.3.10 Sample Classification by Linear Discriminant Analysisp. 138
5.4 Discussionp. 139
6 Expression Profilerp. 142
6.1 Introductionp. 142
6.2 EPCLUSTp. 143
6.2.1 EPCLUST: Data Importp. 143
6.2.2 EPCLUST: Data Filteringp. 144
6.2.3 EPCLUST: Data Annotationp. 146
6.2.4 EPCLUST: Data Environmentp. 147
6.2.5 EPCLUST: Data Analysisp. 148
6.3 URLMAP: Cross-Linking of the Analysis Results Between the Tools and Databasesp. 151
6.4 EP:GO GeneOntology Browserp. 152
6.5 EP:PPI: Comparison of Protein Pairs and Expressionp. 153
6.6 Pattern Discovery, Pattern Matching, and Visualization Toolsp. 154
6.7 An Example of the Data Analysis and Visualizations Performed by the Tools in Expression Profilerp. 154
6.8 Integration of Expression Profiler with Public Microarray Databasesp. 159
6.9 Conclusionsp. 160
7 An S-PLUS Library for the Analysis and Visualization of Differential Expressionp. 163
7.1 Introductionp. 163
7.2 Assessment of Differential Expressionp. 164
7.2.1 Local Pooled Errorp. 165
7.2.2 Tests for Differential Expressionp. 169
7.2.3 Cluster Analysis and Visualizationp. 171
7.3 Analysis of Melanoma Expressionp. 174
7.3.1 Tests for Differential Expressionp. 175
7.3.2 Cluster Analysis and Visualizationp. 178
7.3.3 Annotationp. 180
7.4 Discussionp. 181
8 DRAGON and DRAGON View: Methods for the Annotation, Analysis, and Visualization of Large-Scale Gene Expression Datap. 185
8.1 Introductionp. 185
8.2 System and Methodsp. 189
8.2.1 Overview of DRAGONp. 189
8.2.2 DRAGON's Hardware, Software, and Database Architecturep. 190
8.2.3 Cross-Referencing Information in DRAGONp. 192
8.2.4 The DRAGON Search and Annotate Toolsp. 193
8.2.5 The DRAGON View Data Visualization Toolsp. 196
8.2.6 DRAGON Gram: A Novel Visualization Toolp. 198
8.3 Implementationp. 199
8.4 Discussion and Conclusionp. 204
9 SNOMAD: Biologist-Friendly Web Tools for the Standardization and NOrmalization of Microarray Datap. 210
9.1 Introductionp. 210
9.2 Methods and Applicationp. 212
9.2.1 Overview of Experimental and Data Analysis Proceduresp. 212
9.2.2 Background Subtractionp. 214
9.2.3 Global Mean Normalizationp. 214
9.2.4 Standard Data Transformation and Visualization Methodsp. 215
9.2.5 Local Mean Normalization Across Element Signal Intensityp. 217
9.2.6 Local Variance Correction Across Element Signal Intensityp. 219
9.2.7 Local Mean Normalization Across the Microarray Surfacep. 223
9.3 Softwarep. 225
9.4 Discussionp. 226
10 Microarray Analysis Using the MicroArray Explorerp. 229
10.1 Introductionp. 229
10.1.1 Need for the Methodologyp. 230
10.1.2 Basic Ideas Behind the Approachp. 231
10.2 Methods--Statistical and Informatics Basisp. 232
10.2.1 Analysis Paradigmp. 235
10.2.2 Particular Analysis Methodsp. 238
10.2.3 Data Conversionp. 238
10.3 Softwarep. 239
10.3.1 System Design--Software Implementationp. 244
10.3.2 How to Download the Softwarep. 247
10.3.3 Strengths and Weaknesses of the Approachp. 248
10.4 Applicationsp. 249
10.5 Discussionp. 251
11 Parametric Empirical Bayes Methods for Microarraysp. 254
11.1 Introductionp. 254
11.2 EB Methodsp. 256
11.2.1 Canonical EB Examplep. 256
11.2.2 General Model Structure: Two Conditionsp. 256
11.2.3 Multiple Conditionsp. 258
11.2.4 The Gamma-Gamma and Lognormal-Normal Modelsp. 259
11.2.5 Model Fittingp. 260
11.3 Softwarep. 261
11.4 Applicationp. 263
11.5 Discussionp. 269
12 SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarraysp. 272
12.1 Introductionp. 272
12.2 Methods and Applicationsp. 273
12.2.1 Multiple Hypothesis Testingp. 273
12.2.2 An Applicationp. 275
12.2.3 Forming the Test Statisticsp. 276
12.2.4 Calculating the Null Distributionp. 277
12.2.5 The SAM Thresholding Procedurep. 278
12.2.6 Estimating False-Discovery Ratesp. 280
12.3 Softwarep. 283
12.3.1 Obtaining the Softwarep. 283
12.3.2 Data Formatsp. 283
12.3.3 Response Formatp. 284
12.3.4 Example Input Data File for an Unpaired Problemp. 285
12.3.5 Block Permutationsp. 285
12.3.6 Normalization of Experimentsp. 285
12.3.7 Handling Missing Datap. 287
12.3.8 Running SAMp. 287
12.3.9 Format of the Significant Gene Listp. 288
12.4 Discussionp. 289
13 Adaptive Gene Picking with Microarray Data: Detecting Important Low Abundance Signalsp. 291
13.1 Introductionp. 291
13.2 Methodsp. 292
13.2.1 Background Subtractionp. 292
13.2.2 Transformation to Approximate Normalityp. 293
13.2.3 Differential Expression Across Conditionsp. 295
13.2.4 Robust Center and Spreadp. 297
13.2.5 Formal Evaluation of Significant Differential Expressionp. 299
13.2.6 Simulation Studiesp. 301
13.2.7 Comparison of Methods with E. coli Datap. 304
13.3 Softwarep. 304
13.4 Applicationp. 306
13.4.1 Diabetes and Obesity Studiesp. 306
13.4.2 Software Examplep. 308
14 MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experimentsp. 313
14.1 Introductionp. 313
14.2 Methodsp. 314
14.2.1 Data Acquisitionp. 315
14.2.2 ANOVA Models for Microarray Datap. 315
14.2.3 Experimental Design for Microarraysp. 317
14.2.4 Data Transformationsp. 321
14.2.5 Algorithms for Computing ANOVA Estimatesp. 322
14.2.6 Statistical Inferencep. 323
14.2.7 Cluster Analysisp. 327
14.3 Softwarep. 328
14.3.1 Availabilityp. 328
14.3.2 Functionalityp. 329
14.4 Data Analysis with MAANOVAp. 334
14.5 Discussionp. 339
15 GeneClustp. 342
15.1 Introductionp. 342
15.2 Methodsp. 343
15.2.1 Algorithmp. 343
15.2.2 Choice of Cluster Size via the Gap Statisticp. 344
15.2.3 Supervised Gene Shaving for Class Discriminationp. 346
15.3 Softwarep. 347
15.3.1 The GeneShaving Packagep. 347
15.3.2 GeneClust: A Faster Implementation of Gene Shavingp. 352
15.4 Applicationsp. 354
15.4.1 The Alon Colon Datasetp. 354
15.4.2 The NCI60 Datasetp. 356
15.5 Discussionp. 358
16 POE: Statistical Methods for Qualitative Analysis of Gene Expressionp. 362
16.1 Introductionp. 362
16.2 Methodologyp. 364
16.2.1 Mixture Model for Gene Expressionp. 364
16.2.2 Useful Representations of the Resultsp. 366
16.2.3 Bayesian Hierarchical Model Formulationp. 367
16.2.4 Restrictions to Remove Ambiguity in the Case of Only Two Componentsp. 368
16.2.5 Mining for Subsets of Genesp. 368
16.2.6 Creating Molecular Profilesp. 370
16.3 R Software Extension: POEp. 371
16.3.1 An Example of Using POE on Simulated Datap. 371
16.3.2 Estimating Posterior Probability of Expression Using poe.fitp. 372
16.3.3 Visualization Toolsp. 374
16.3.4 Gene-Mining Functionsp. 377
16.3.5 Molecular Profiling Toolp. 379
16.4 Results of POE Applied to Lung Cancer Datap. 381
16.5 Discussion and Future Workp. 384
17 Bayesian Decompositionp. 388
17.1 Introductionp. 388
17.1.1 Role of Signaling and Metabolic Pathwaysp. 388
17.1.2 Gene Expression Microarraysp. 389
17.2 Methodsp. 390
17.2.1 Matrix Decompositionp. 390
17.2.2 Markov Chain Monte Carlop. 391
17.2.3 Bayesian Frameworkp. 392
17.2.4 The Prior Distributionp. 393
17.2.5 Summary Statisticsp. 395
17.3 Softwarep. 396
17.3.1 Implementationp. 396
17.3.2 Files and Installationp. 396
17.3.3 Issues in the Application of Bayesian Decompositionp. 397
17.4 Application of Bayesian Decomposition to Yeast Cell Cycle Datap. 398
17.4.1 Preparation of the Datap. 398
17.4.2 Running the Programp. 399
17.4.3 Visualizing the Outputp. 400
17.4.4 Interpretationp. 402
17.4.5 Advantages of Bayesian Decompositionp. 403
17.5 Discussionp. 403
18 Bayesian Clustering of Gene Expression Dynamicsp. 409
18.1 Introductionp. 409
18.2 Methodsp. 411
18.2.1 Modeling Timep. 412
18.2.2 Probabilistic Scoring Metricp. 413
18.2.3 Heuristic Searchp. 415
18.2.4 Statistical Diagnosticsp. 416
18.3 Softwarep. 417
18.3.1 Screen 0: Welcome Screenp. 417
18.3.2 Screen 1: Getting Startedp. 418
18.3.3 Screen 2: Analysisp. 418
18.3.4 Screen 3: Cluster Modelp. 419
18.3.5 Screen 4: Pack and Go!p. 419
18.4 Applicationp. 420
18.4.1 Analysisp. 420
18.4.2 Statistical Diagnosticsp. 421
18.4.3 Understanding the Modelp. 421
18.5 Conclusionsp. 424
19 Relevance Networks: A First Step Toward Finding Genetic Regulatory Networks Within Microarray Datap. 428
19.1 Introductionp. 428
19.1.1 Advantages of Relevance Networksp. 429
19.2 Methodologyp. 431
19.2.1 Formal Definition of Relevance Networksp. 431
19.2.2 Finding Regulatory Networks in Phenotypic Datap. 432
19.2.3 Using Entropy and Mutual Information to Evaluate Gene-Gene Associationsp. 434
19.3 Applicationsp. 437
19.3.1 Finding Pharmacogenomic Regulatory Networksp. 437
19.3.2 Setting the Thresholdp. 439
19.4 Softwarep. 440
Indexp. 447