Title:

Life science data mining

Series:

Science, engineering and biology informatics ; 2

Publication Information:

Hackensack, NJ : World Scientific Publishing, 2006

ISBN:

9789812700643

### Available:*

Library | Item Barcode | Call Number | Material Type | Status |
---|---|---|---|---|

Searching... | 30000010139012 | QH324.2 L53 2006 | Open Access Book | Searching... |

### On Order

### Summary

### Summary

"This book identifies and highlights the latest data mining paradigms to analyze combine integrate model and simulate vast amounts of heterogeneous multi-modal, multi-scale data for emerging real-world applications in life science."--BOOK JACKET.

### Table of Contents

Preface | p. v |

Chapter 1 Survey of Early Warning Systems for Environmental and Public Health Applications | p. 1 |

1 Introduction | p. 1 |

2 Disease Surveillance | p. 3 |

3 Reference Architecture for Model Extraction | p. 5 |

4 Problem Domain | p. 9 |

5 Data Sources | p. 10 |

6 Detection Methods | p. 12 |

7 Summary and Conclusion | p. 13 |

References | p. 14 |

Chapter 2 Time-Lapse Cell Cycle Quantitative Data Analysis Using Gaussian Mixture Models | p. 17 |

1 Introduction | p. 18 |

2 Material and Feature Extraction | p. 20 |

2.1 Material and cell feature extraction | p. 20 |

2.2 Model the time-lapse data using AR model | p. 23 |

3 Problem Statement and Formulation | p. 24 |

4 Classification Methods | p. 26 |

4.1 Gaussian mixture models and the EM algorithm | p. 26 |

4.2 K-Nearest Neighbor (KNN) classifier | p. 28 |

4.3 Neural networks | p. 28 |

4.4 Decision tree | p. 29 |

4.5 Fisher clustering | p. 30 |

5 Experimental Results | p. 30 |

5.1 Trace identification | p. 31 |

5.2 Cell morphologic similarity analysis | p. 33 |

5.3 Phase identification | p. 35 |

5.4 Cluster analysis of time-lapse data | p. 37 |

6 Conclusion | p. 40 |

Appendix A p. 41 | |

Appendix B p. 42 | |

References | p. 43 |

Chapter 3 Diversity and Accuracy of Data Mining Ensemble | p. 47 |

1 Introduction | p. 47 |

2 Ensemble and Diversity | p. 49 |

2.1 Why needs diversity? | p. 49 |

2.2 Diversity measures | p. 51 |

3 Probability Analysis | p. 52 |

4 Coincident Failure Diversity | p. 52 |

5 Ensemble Accuracy | p. 55 |

5.1 Relationship between random guess and accuracy of lower bound single models | p. 55 |

5.2 Relationship between accuracy A and the number of models N | p. 56 |

5.3 When model's accuracy [Less than] 50% | p. 57 |

6 Construction of Effective Ensembles | p. 58 |

6.1 Strategies for increasing diversity | p. 59 |

6.2 Ensembles of neural networks | p. 60 |

6.3 Ensembles of decision trees | p. 61 |

6.4 Hybrid ensembles | p. 62 |

7 An Application: Osteoporosis Classification Problem | p. 62 |

7.1 Osteoporosis problem | p. 63 |

7.2 Results from the ensembles of neural nets | p. 63 |

7.3 Results from ensembles of the decision trees | p. 66 |

7.4 Results of hybrid ensembles | p. 67 |

8 Discussion and Conclusions | p. 68 |

References | p. 70 |

Chapter 4 Integrated Clustering for Microarray Data | p. 73 |

1 Introduction | p. 73 |

2 Related Work | p. 77 |

3 Data Preprocessing | p. 81 |

4 Integrated Clustering | p. 83 |

4.1 Clustering algorithms | p. 83 |

4.2 Integration methodology | p. 88 |

5 Experimental Evaluation | p. 89 |

5.1 Evaluation methodology | p. 89 |

5.2 Results | p. 91 |

5.3 Discussion | p. 93 |

6 Conclusions | p. 94 |

References | p. 94 |

Chapter 5 Complexity and Synchronization of EEG with Parametric Modeling | p. 99 |

1 Introduction | p. 100 |

1.1 Brief review of EEG recording analysis | p. 100 |

1.2 AR modeling based EEG analysis | p. 101 |

2 TVAR Modeling | p. 104 |

3 Complexity Measure | p. 105 |

4 Synchronization Measure | p. 109 |

5 Conclusions | p. 113 |

References | p. 114 |

Chapter 6 Bayesian Fusion of Syndromic Surveillance with Sensor Data for Disease Outbreak Classification | p. 119 |

1 Introduction | p. 120 |

2 Approach | p. 122 |

2.1 Bayesian belief networks | p. 122 |

2.2 Syndromic data | p. 126 |

2.3 Environmental data | p. 128 |

2.4 Test scenarios | p. 130 |

2.5 Evaluation metrics | p. 130 |

3 Results | p. 131 |

3.1 Scenario 1 | p. 131 |

3.2 Scenario 2 | p. 134 |

3.3 Promptness | p. 135 |

4 Summary and Conclusions | p. 136 |

References | p. 137 |

Chapter 7 An Evaluation of Over-the-Counter Medication Sales for Syndromic Surveillance | p. 143 |

1 Introduction | p. 143 |

2 Background and Related Work | p. 144 |

3 Data | p. 144 |

4 Approaches | p. 145 |

4.1 Lead-lag correlation analysis | p. 145 |

4.2 Regression test of predictive ability | p. 146 |

4.3 Detection-based approaches | p. 148 |

4.4 Supervised algorithm for outbreak detection in OTC data | p. 148 |

4.5 Modified Holt-Winters forecaster | p. 150 |

4.6 Forecasting based on multi-channel regression | p. 151 |

5 Experiments | p. 153 |

5.1 Lead-lag correlation analysis of OTC data | p. 153 |

5.2 Regression test of the predicative value of OTC | p. 154 |

5.3 Results from detection-based approaches | p. 156 |

6 Conclusions and Future Work | p. 158 |

References | p. 159 |

Chapter 8 Collaborative Health Sentinel | p. 163 |

1 Introduction | p. 163 |

2 Infectious Disease and Existing Health Surveillance Programs | p. 166 |

3 Elements of the Collaborative Health Sentinel (CHS) System | p. 170 |

3.1 Sampling | p. 170 |

3.2 Creating a national health map | p. 177 |

3.3 Detection | p. 177 |

3.4 Reaction | p. 183 |

3.5 Cost considerations | p. 184 |

4 Interaction with the Health Information Technology (HCIT) World | p. 185 |

5 Conclusion | p. 188 |

References | p. 189 |

Appendix A HL7 | p. 192 |

Chapter 9 A Multi-Modal System Approach for Drug Abuse Research and Treatment Evaluation: Information Systems Needs and Challenges | p. 195 |

1 Introduction | p. 195 |

2 Context | p. 198 |

2.1 Data sources | p. 198 |

2.2 Examples of relevant questions | p. 199 |

3 Possible System Structure | p. 201 |

4 Challenges in System Development and Implementation | p. 204 |

4.1 Ontology development | p. 204 |

4.2 Data source control, proprietary issues | p. 205 |

4.3 Privacy, security issues | p. 205 |

4.4 Costs to implement/maintain system | p. 206 |

4.5 Historical hypothesis-testing paradigm | p. 206 |

4.6 Utility, usability, credibility of such a system | p. 206 |

4.7 Funding of system development | p. 207 |

5 Summary | p. 207 |

References | p. 208 |

Chapter 10 Knowledge Representation for Versatile Hybrid Intelligent Processing Applied in Predictive Toxicology | p. 213 |

1 Introduction | p. 214 |

2 Hybrid Intelligent Techniques for Predictive Toxicology Knowledge Representation | p. 217 |

3 XML Schemas for Knowledge Representation and Processing in AI and Predictive Toxicology | p. 218 |

4 Towards a Standard for Chemical Data Representation in Predictive Toxicology | p. 220 |

5 Hybrid Intelligent Systems for Knowledge Representation in Predictive Toxicology | p. 225 |

5.1 A formal description of implicit and explicit knowledge-based intelligent systems | p. 226 |

5.2 An XML schema for hybrid intelligent systems | p. 228 |

6 A Case Study | p. 231 |

6.1 Materials and methods | p. 232 |

6.2 Results | p. 233 |

7 Conclusions | p. 235 |

References | p. 236 |

Chapter 11 Ensemble Classification System Implementation for Biomedical Microarray Data | p. 239 |

1 Introduction | p. 240 |

2 Background | p. 241 |

2.1 Reasons for ensemble | p. 241 |

2.2 Diversity and ensemble | p. 241 |

2.3 Relationship between measures of diversity and combination method | p. 243 |

2.4 Measures of diversity | p. 243 |

2.5 Microarray data | p. 244 |

3 Ensemble Classification System (ECS) Design | p. 245 |

3.1 ECS overview | p. 245 |

3.2 Feature subset selection | p. 247 |

3.3 Base classifiers | p. 248 |

3.4 Combination strategy | p. 249 |

4 Experiments | p. 250 |

4.1 Experimental datasets | p. 250 |

4.2 Experimental results | p. 252 |

5 Conclusion and Further Work | p. 254 |

References | p. 255 |

Chapter 12 An Automated Method for Cell Phase Identification in High Throughput Time-Lapse Screens | p. 257 |

1 Introduction | p. 258 |

2 Nuclei Segmentation and Tracking | p. 259 |

3 Cell Phase Identification | p. 260 |

3.1 Feature calculation | p. 260 |

3.2 Identifying cell phase | p. 262 |

3.3 Correcting cell phase identification errors | p. 265 |

4 Experimental Results | p. 266 |

5 Conclusion | p. 272 |

References | p. 272 |

Chapter 13 Inference of Transcriptional Regulatory Networks Based on Cancer Microarray Data | p. 275 |

1 Introduction | p. 275 |

2 Subnetworks and Transcriptional Regulatory Networks Inference | p. 277 |

2.1 Inferring subnetworks using z-score | p. 277 |

2.2 Inferring subnetworks based on graph theory | p. 278 |

2.3 Inferring subnetworks based on Bayesian networks | p. 279 |

2.4 Inferring transcriptional regulatory networks based on integrated expression and sequence data | p. 283 |

3 Multinomial Probit Regression with Baysian Gene Selection | p. 284 |

3.1 Problem formulation | p. 284 |

3.2 Bayesian variable selection | p. 286 |

3.3 Bayesian estimation using the strongest genes | p. 288 |

3.4 Experimental results | p. 289 |

4 Network Construction Based on Clustering and Predictor Design | p. 293 |

4.1 Predictor construction using reversible jump MCMC annealing | p. 293 |

4.2 CoD for predictors | p. 295 |

4.3 Experimental results on a Myeloid line | p. 296 |

5 Concluding Remarks | p. 298 |

References | p. 299 |

Chapter 14 Data Mining in Biomedicine | p. 305 |

1 Introduction | p. 305 |

2 Predictive Model Construction | p. 306 |

2.1 Derivation of unsupervised models | p. 307 |

2.2 Derivation of supervised models | p. 311 |

3 Validation | p. 316 |

4 Impact Analysis | p. 318 |

5 Summary | p. 319 |

References | p. 319 |

Chapter 15 Mining Multilevel Association Rules from Gene Ontology and Microarray Data | p. 321 |

1 Introduction | p. 321 |

2 Proposed Methods | p. 323 |

2.1 Preprocessing | p. 323 |

2.2 Hierarchy-information encoding | p. 324 |

3 The MAGO Algorithm | p. 326 |

3.1 MAGO algorithm | p. 327 |

3.2 CMAGO (Constrained Multilevel Association rules with Gene Ontology) | p. 329 |

4 Experimental Results | p. 330 |

4.1 The characteristic of the dataset | p. 331 |

4.2 Experimental results | p. 331 |

4.3 Interpretation | p. 334 |

5 Concluding Remarks | p. 335 |

References | p. 336 |

Chapter 16 A Proposed Sensor-Configuration and Sensitivity Analysis of Parameters with Applications to Biosensors | p. 339 |

1 Introduction | p. 340 |

2 Sensor-System Configuration | p. 342 |

3 Optical Biosensors | p. 346 |

3.1 Relationship between parameters | p. 347 |

3.2 Modelling of parameters | p. 351 |

4 Discussion | p. 356 |

Conclusion | p. 358 |

References | p. 359 |

Epilogue | p. 361 |

References | p. 364 |

Index | p. 365 |