Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010119440 | QP624.5.D726 A52 2003 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Thedevelopmentoftechnologiesforhigh-throughputmeasurementofgene expression in biological system is providing powerful new tools for inv- tigating the transcriptome on a genomic scale, and across diverse biol- ical systems and experimental designs. This technological transformation is generating an increasing demand for data analysis in biological inv- tigations of gene expression. This book focuses on data analysis of gene expression microarrays. The goal is to provide guidance to practitioners in deciding which statistical approaches and packages may be indicated for their projects, in choosing among the various options provided by those packages, and in correctly interpreting the results. The book is a collection of chapters written by authors of statistical so- ware for microarray data analysis. Each chapter describes the conceptual and methodological underpinning of data analysis tools as well as their software implementation, and will enable readers to both understand and implement an analysis approach. Methods touch on all aspects of statis- cal analysis of microarrays, from annotation and ?ltering to clustering and classi?cation. All software packages described are free to academic users. The materials presented cover a range of software tools designed for varied audiences. Some chapters describe simple menu-driven software in a user-friendly fashion and are designed to be accessible to microarray data analystswithoutformalquantitativetraining.Mostchaptersaredirectedat microarray data analysts with master's-level training in computer science, biostatistics, or bioinformatics. A minority of more advanced chapters are intended for doctoral students and researchers.
Table of Contents
Preface | p. v |
Contributors | p. xvii |
Color Insert | |
1 The Analysis of Gene Expression Data: An Overview of Methods and Software | p. 1 |
1.1 Measuring Gene Expression Using Microarrays | p. 1 |
1.1.1 Microarray Technologies | p. 1 |
1.1.2 Sources of Variation in Gene Expression Measurements Using Microarrays | p. 4 |
1.1.3 Phases of Microarray Data Analysis | p. 5 |
1.2 Design of Microarray Experiments | p. 7 |
1.2.1 Replication and Sample Size Considerations | p. 7 |
1.2.2 Design of Two-Channel Arrays | p. 9 |
1.3 Data Storage | p. 9 |
1.3.1 Databases | p. 9 |
1.3.2 Standards | p. 10 |
1.3.3 Statistical Analysis Languages | p. 11 |
1.4 Preprocessing | p. 12 |
1.4.1 Image Analysis | p. 12 |
1.4.2 Visualizations for Quality Control | p. 12 |
1.4.3 Background Subtraction | p. 13 |
1.4.4 Probe-level Analysis of Oligonucleotide Arrays | p. 14 |
1.4.5 Within-Array Normalization of cDNA Arrays | p. 15 |
1.4.6 Normalization Across Arrays | p. 15 |
1.5 Screening for Differentially Expressed Genes | p. 16 |
1.5.1 Estimation or Selection? | p. 16 |
1.5.2 One Problem or Many? | p. 17 |
1.5.3 Selection and False Discovery Rates | p. 18 |
1.5.4 Beyond Two Groups | p. 19 |
1.6 Challenges of Genome Biometry Analyses | p. 19 |
1.7 Visualization and Unsupervised Analyses | p. 21 |
1.7.1 Profile Visualization | p. 21 |
1.7.2 Why Clustering? | p. 22 |
1.7.3 Hierarchical Clustering | p. 23 |
1.7.4 k-Means Clustering and Self-Organizing Maps | p. 25 |
1.7.5 Model-Based Clustering | p. 26 |
1.7.6 Principal Components Analysis | p. 26 |
1.7.7 Multidimensional Scaling | p. 27 |
1.7.8 Identifying Novel Molecular Subclasses | p. 27 |
1.7.9 Time Series Analysis | p. 28 |
1.8 Prediction | p. 29 |
1.8.1 Prediction Tools | p. 29 |
1.8.2 Dimension Reduction | p. 30 |
1.8.3 Evaluation of Classifiers | p. 30 |
1.8.4 Regression-Based Approaches | p. 31 |
1.8.5 Classification Trees | p. 31 |
1.8.6 Probabilistic Model-Based Classification | p. 32 |
1.8.7 Discriminant Analysis | p. 33 |
1.8.8 Nearest-Neighbor Classifiers | p. 33 |
1.8.9 Support Vector Machines | p. 33 |
1.9 Free and Open-Source Software | p. 33 |
1.9.1 Whitehead Institute Tools | p. 34 |
1.9.2 Eisen Lab Tools | p. 34 |
1.9.3 TIGR Tools | p. 34 |
1.9.4 GeneX and CyberT | p. 35 |
1.9.5 Projects and NCBI | p. 35 |
1.9.6 BRB | p. 35 |
1.9.7 The OOML library | p. 36 |
1.9.8 MatArray | p. 36 |
1.9.9 BASE | p. 36 |
1.10 Conclusion | p. 36 |
2 Visualization and Annotation of Genomic Experiments | p. 46 |
2.1 Introduction | p. 46 |
2.2 Motivations for Component-Based Software | p. 47 |
2.3 Formalism | p. 49 |
2.4 Bioconductor Software for Filtering, Exploring, and Interpreting Microarray Experiments | p. 50 |
2.4.1 Formal Data Structures and Methods for Multiple Microarrays | p. 50 |
2.4.2 Tools for Filtering Gene Expression Data: The Closure Concept | p. 54 |
2.4.3 Expression Density Diagnostics: High-Throughput Exploratory Data Analysis for Microarrays | p. 55 |
2.4.4 Annotation | p. 57 |
2.5 Visualization | p. 58 |
2.5.1 Chromosomes | p. 59 |
2.6 Applications | p. 64 |
2.6.1 A Case Study of Gene Filtering | p. 64 |
2.6.2 Application of Expression Density Diagnostics | p. 67 |
2.7 Conclusions | p. 70 |
3 Bioconductor R Packages for Exploratory Analysis and Normalization of cDNA Microarray Data | p. 73 |
3.1 Introduction | p. 73 |
3.1.1 Overview of Packages | p. 73 |
3.1.2 Two-Color cDNA Microarray Experiments | p. 75 |
3.2 Methods | p. 76 |
3.2.1 Standards for Microarray Data | p. 76 |
3.2.2 Object-Oriented Programming: Microarray Classes and Methods | p. 77 |
3.2.3 Diagnostic Plots | p. 78 |
3.2.4 Normalization Using Robust Local Regression | p. 79 |
3.3 Application: Swirl Microarray Experiment | p. 80 |
3.4 Software | p. 81 |
3.4.1 Package marrayClasses--Classes and Methods for cDNA Microarray Data | p. 81 |
3.4.2 Package marrayInput--Data Input for cDNA Microarrays | p. 89 |
3.4.3 Package marrayPlots--Diagnostic Plots for cDNA Microarray Data | p. 91 |
3.4.4 Package marrayNorm--Location and Scale Normalization for cDNA Microarray Data | p. 96 |
3.5 Discussion | p. 99 |
4 An R Package for Analyses of Affymetrix Oligonucleotide Arrays | p. 102 |
4.1 Introduction | p. 102 |
4.2 Methods | p. 103 |
4.2.1 Notation | p. 103 |
4.2.2 The CEL/CDF Convention | p. 104 |
4.2.3 Probe Pair Sets | p. 106 |
4.2.4 Probe-Level Objects | p. 107 |
4.2.5 Normalization | p. 108 |
4.2.6 Exploratory Data Analysis of Probe-Level Data | p. 111 |
4.3 Application | p. 113 |
4.3.1 Expression Measures | p. 113 |
4.4 Software | p. 115 |
4.4.1 A Case Study | p. 115 |
4.4.2 Extending the Package | p. 118 |
4.5 Conclusion | p. 118 |
5 DNA-Chip Analyzer (dChip) | p. 120 |
5.1 Introduction | p. 120 |
5.2 Methods | p. 121 |
5.2.1 Normalization of Arrays Based on an "Invariant Set" | p. 121 |
5.2.2 Model-Based Analysis of Oligonucleotide Arrays | p. 122 |
5.2.3 Confidence Interval for Fold Change | p. 122 |
5.2.4 Pooling Replicate Arrays Considering Measurement Accuracy | p. 124 |
5.3 Software and Applications | p. 125 |
5.3.1 Reading in Array Data Files | p. 125 |
5.3.2 Viewing an Array Image | p. 127 |
5.3.3 Normalizing Arrays | p. 129 |
5.3.4 Viewing PM/MM Data | p. 129 |
5.3.5 Calculating Model-Based Expression Indexes | p. 131 |
5.3.6 Filter Genes | p. 132 |
5.3.7 Hierarchical Clustering | p. 133 |
5.3.8 Comparing Samples | p. 135 |
5.3.9 Mapping Genes to Chromosomes | p. 137 |
5.3.10 Sample Classification by Linear Discriminant Analysis | p. 138 |
5.4 Discussion | p. 139 |
6 Expression Profiler | p. 142 |
6.1 Introduction | p. 142 |
6.2 EPCLUST | p. 143 |
6.2.1 EPCLUST: Data Import | p. 143 |
6.2.2 EPCLUST: Data Filtering | p. 144 |
6.2.3 EPCLUST: Data Annotation | p. 146 |
6.2.4 EPCLUST: Data Environment | p. 147 |
6.2.5 EPCLUST: Data Analysis | p. 148 |
6.3 URLMAP: Cross-Linking of the Analysis Results Between the Tools and Databases | p. 151 |
6.4 EP:GO GeneOntology Browser | p. 152 |
6.5 EP:PPI: Comparison of Protein Pairs and Expression | p. 153 |
6.6 Pattern Discovery, Pattern Matching, and Visualization Tools | p. 154 |
6.7 An Example of the Data Analysis and Visualizations Performed by the Tools in Expression Profiler | p. 154 |
6.8 Integration of Expression Profiler with Public Microarray Databases | p. 159 |
6.9 Conclusions | p. 160 |
7 An S-PLUS Library for the Analysis and Visualization of Differential Expression | p. 163 |
7.1 Introduction | p. 163 |
7.2 Assessment of Differential Expression | p. 164 |
7.2.1 Local Pooled Error | p. 165 |
7.2.2 Tests for Differential Expression | p. 169 |
7.2.3 Cluster Analysis and Visualization | p. 171 |
7.3 Analysis of Melanoma Expression | p. 174 |
7.3.1 Tests for Differential Expression | p. 175 |
7.3.2 Cluster Analysis and Visualization | p. 178 |
7.3.3 Annotation | p. 180 |
7.4 Discussion | p. 181 |
8 DRAGON and DRAGON View: Methods for the Annotation, Analysis, and Visualization of Large-Scale Gene Expression Data | p. 185 |
8.1 Introduction | p. 185 |
8.2 System and Methods | p. 189 |
8.2.1 Overview of DRAGON | p. 189 |
8.2.2 DRAGON's Hardware, Software, and Database Architecture | p. 190 |
8.2.3 Cross-Referencing Information in DRAGON | p. 192 |
8.2.4 The DRAGON Search and Annotate Tools | p. 193 |
8.2.5 The DRAGON View Data Visualization Tools | p. 196 |
8.2.6 DRAGON Gram: A Novel Visualization Tool | p. 198 |
8.3 Implementation | p. 199 |
8.4 Discussion and Conclusion | p. 204 |
9 SNOMAD: Biologist-Friendly Web Tools for the Standardization and NOrmalization of Microarray Data | p. 210 |
9.1 Introduction | p. 210 |
9.2 Methods and Application | p. 212 |
9.2.1 Overview of Experimental and Data Analysis Procedures | p. 212 |
9.2.2 Background Subtraction | p. 214 |
9.2.3 Global Mean Normalization | p. 214 |
9.2.4 Standard Data Transformation and Visualization Methods | p. 215 |
9.2.5 Local Mean Normalization Across Element Signal Intensity | p. 217 |
9.2.6 Local Variance Correction Across Element Signal Intensity | p. 219 |
9.2.7 Local Mean Normalization Across the Microarray Surface | p. 223 |
9.3 Software | p. 225 |
9.4 Discussion | p. 226 |
10 Microarray Analysis Using the MicroArray Explorer | p. 229 |
10.1 Introduction | p. 229 |
10.1.1 Need for the Methodology | p. 230 |
10.1.2 Basic Ideas Behind the Approach | p. 231 |
10.2 Methods--Statistical and Informatics Basis | p. 232 |
10.2.1 Analysis Paradigm | p. 235 |
10.2.2 Particular Analysis Methods | p. 238 |
10.2.3 Data Conversion | p. 238 |
10.3 Software | p. 239 |
10.3.1 System Design--Software Implementation | p. 244 |
10.3.2 How to Download the Software | p. 247 |
10.3.3 Strengths and Weaknesses of the Approach | p. 248 |
10.4 Applications | p. 249 |
10.5 Discussion | p. 251 |
11 Parametric Empirical Bayes Methods for Microarrays | p. 254 |
11.1 Introduction | p. 254 |
11.2 EB Methods | p. 256 |
11.2.1 Canonical EB Example | p. 256 |
11.2.2 General Model Structure: Two Conditions | p. 256 |
11.2.3 Multiple Conditions | p. 258 |
11.2.4 The Gamma-Gamma and Lognormal-Normal Models | p. 259 |
11.2.5 Model Fitting | p. 260 |
11.3 Software | p. 261 |
11.4 Application | p. 263 |
11.5 Discussion | p. 269 |
12 SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarrays | p. 272 |
12.1 Introduction | p. 272 |
12.2 Methods and Applications | p. 273 |
12.2.1 Multiple Hypothesis Testing | p. 273 |
12.2.2 An Application | p. 275 |
12.2.3 Forming the Test Statistics | p. 276 |
12.2.4 Calculating the Null Distribution | p. 277 |
12.2.5 The SAM Thresholding Procedure | p. 278 |
12.2.6 Estimating False-Discovery Rates | p. 280 |
12.3 Software | p. 283 |
12.3.1 Obtaining the Software | p. 283 |
12.3.2 Data Formats | p. 283 |
12.3.3 Response Format | p. 284 |
12.3.4 Example Input Data File for an Unpaired Problem | p. 285 |
12.3.5 Block Permutations | p. 285 |
12.3.6 Normalization of Experiments | p. 285 |
12.3.7 Handling Missing Data | p. 287 |
12.3.8 Running SAM | p. 287 |
12.3.9 Format of the Significant Gene List | p. 288 |
12.4 Discussion | p. 289 |
13 Adaptive Gene Picking with Microarray Data: Detecting Important Low Abundance Signals | p. 291 |
13.1 Introduction | p. 291 |
13.2 Methods | p. 292 |
13.2.1 Background Subtraction | p. 292 |
13.2.2 Transformation to Approximate Normality | p. 293 |
13.2.3 Differential Expression Across Conditions | p. 295 |
13.2.4 Robust Center and Spread | p. 297 |
13.2.5 Formal Evaluation of Significant Differential Expression | p. 299 |
13.2.6 Simulation Studies | p. 301 |
13.2.7 Comparison of Methods with E. coli Data | p. 304 |
13.3 Software | p. 304 |
13.4 Application | p. 306 |
13.4.1 Diabetes and Obesity Studies | p. 306 |
13.4.2 Software Example | p. 308 |
14 MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments | p. 313 |
14.1 Introduction | p. 313 |
14.2 Methods | p. 314 |
14.2.1 Data Acquisition | p. 315 |
14.2.2 ANOVA Models for Microarray Data | p. 315 |
14.2.3 Experimental Design for Microarrays | p. 317 |
14.2.4 Data Transformations | p. 321 |
14.2.5 Algorithms for Computing ANOVA Estimates | p. 322 |
14.2.6 Statistical Inference | p. 323 |
14.2.7 Cluster Analysis | p. 327 |
14.3 Software | p. 328 |
14.3.1 Availability | p. 328 |
14.3.2 Functionality | p. 329 |
14.4 Data Analysis with MAANOVA | p. 334 |
14.5 Discussion | p. 339 |
15 GeneClust | p. 342 |
15.1 Introduction | p. 342 |
15.2 Methods | p. 343 |
15.2.1 Algorithm | p. 343 |
15.2.2 Choice of Cluster Size via the Gap Statistic | p. 344 |
15.2.3 Supervised Gene Shaving for Class Discrimination | p. 346 |
15.3 Software | p. 347 |
15.3.1 The GeneShaving Package | p. 347 |
15.3.2 GeneClust: A Faster Implementation of Gene Shaving | p. 352 |
15.4 Applications | p. 354 |
15.4.1 The Alon Colon Dataset | p. 354 |
15.4.2 The NCI60 Dataset | p. 356 |
15.5 Discussion | p. 358 |
16 POE: Statistical Methods for Qualitative Analysis of Gene Expression | p. 362 |
16.1 Introduction | p. 362 |
16.2 Methodology | p. 364 |
16.2.1 Mixture Model for Gene Expression | p. 364 |
16.2.2 Useful Representations of the Results | p. 366 |
16.2.3 Bayesian Hierarchical Model Formulation | p. 367 |
16.2.4 Restrictions to Remove Ambiguity in the Case of Only Two Components | p. 368 |
16.2.5 Mining for Subsets of Genes | p. 368 |
16.2.6 Creating Molecular Profiles | p. 370 |
16.3 R Software Extension: POE | p. 371 |
16.3.1 An Example of Using POE on Simulated Data | p. 371 |
16.3.2 Estimating Posterior Probability of Expression Using poe.fit | p. 372 |
16.3.3 Visualization Tools | p. 374 |
16.3.4 Gene-Mining Functions | p. 377 |
16.3.5 Molecular Profiling Tool | p. 379 |
16.4 Results of POE Applied to Lung Cancer Data | p. 381 |
16.5 Discussion and Future Work | p. 384 |
17 Bayesian Decomposition | p. 388 |
17.1 Introduction | p. 388 |
17.1.1 Role of Signaling and Metabolic Pathways | p. 388 |
17.1.2 Gene Expression Microarrays | p. 389 |
17.2 Methods | p. 390 |
17.2.1 Matrix Decomposition | p. 390 |
17.2.2 Markov Chain Monte Carlo | p. 391 |
17.2.3 Bayesian Framework | p. 392 |
17.2.4 The Prior Distribution | p. 393 |
17.2.5 Summary Statistics | p. 395 |
17.3 Software | p. 396 |
17.3.1 Implementation | p. 396 |
17.3.2 Files and Installation | p. 396 |
17.3.3 Issues in the Application of Bayesian Decomposition | p. 397 |
17.4 Application of Bayesian Decomposition to Yeast Cell Cycle Data | p. 398 |
17.4.1 Preparation of the Data | p. 398 |
17.4.2 Running the Program | p. 399 |
17.4.3 Visualizing the Output | p. 400 |
17.4.4 Interpretation | p. 402 |
17.4.5 Advantages of Bayesian Decomposition | p. 403 |
17.5 Discussion | p. 403 |
18 Bayesian Clustering of Gene Expression Dynamics | p. 409 |
18.1 Introduction | p. 409 |
18.2 Methods | p. 411 |
18.2.1 Modeling Time | p. 412 |
18.2.2 Probabilistic Scoring Metric | p. 413 |
18.2.3 Heuristic Search | p. 415 |
18.2.4 Statistical Diagnostics | p. 416 |
18.3 Software | p. 417 |
18.3.1 Screen 0: Welcome Screen | p. 417 |
18.3.2 Screen 1: Getting Started | p. 418 |
18.3.3 Screen 2: Analysis | p. 418 |
18.3.4 Screen 3: Cluster Model | p. 419 |
18.3.5 Screen 4: Pack and Go! | p. 419 |
18.4 Application | p. 420 |
18.4.1 Analysis | p. 420 |
18.4.2 Statistical Diagnostics | p. 421 |
18.4.3 Understanding the Model | p. 421 |
18.5 Conclusions | p. 424 |
19 Relevance Networks: A First Step Toward Finding Genetic Regulatory Networks Within Microarray Data | p. 428 |
19.1 Introduction | p. 428 |
19.1.1 Advantages of Relevance Networks | p. 429 |
19.2 Methodology | p. 431 |
19.2.1 Formal Definition of Relevance Networks | p. 431 |
19.2.2 Finding Regulatory Networks in Phenotypic Data | p. 432 |
19.2.3 Using Entropy and Mutual Information to Evaluate Gene-Gene Associations | p. 434 |
19.3 Applications | p. 437 |
19.3.1 Finding Pharmacogenomic Regulatory Networks | p. 437 |
19.3.2 Setting the Threshold | p. 439 |
19.4 Software | p. 440 |
Index | p. 447 |