Title:
Life science data mining
Series:
Science, engineering and biology informatics ; 2
Publication Information:
Hackensack, NJ : World Scientific Publishing, 2006
ISBN:
9789812700643
Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010139012 | QH324.2 L53 2006 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
"This book identifies and highlights the latest data mining paradigms to analyze combine integrate model and simulate vast amounts of heterogeneous multi-modal, multi-scale data for emerging real-world applications in life science."--BOOK JACKET.
Table of Contents
Preface | p. v |
Chapter 1 Survey of Early Warning Systems for Environmental and Public Health Applications | p. 1 |
1 Introduction | p. 1 |
2 Disease Surveillance | p. 3 |
3 Reference Architecture for Model Extraction | p. 5 |
4 Problem Domain | p. 9 |
5 Data Sources | p. 10 |
6 Detection Methods | p. 12 |
7 Summary and Conclusion | p. 13 |
References | p. 14 |
Chapter 2 Time-Lapse Cell Cycle Quantitative Data Analysis Using Gaussian Mixture Models | p. 17 |
1 Introduction | p. 18 |
2 Material and Feature Extraction | p. 20 |
2.1 Material and cell feature extraction | p. 20 |
2.2 Model the time-lapse data using AR model | p. 23 |
3 Problem Statement and Formulation | p. 24 |
4 Classification Methods | p. 26 |
4.1 Gaussian mixture models and the EM algorithm | p. 26 |
4.2 K-Nearest Neighbor (KNN) classifier | p. 28 |
4.3 Neural networks | p. 28 |
4.4 Decision tree | p. 29 |
4.5 Fisher clustering | p. 30 |
5 Experimental Results | p. 30 |
5.1 Trace identification | p. 31 |
5.2 Cell morphologic similarity analysis | p. 33 |
5.3 Phase identification | p. 35 |
5.4 Cluster analysis of time-lapse data | p. 37 |
6 Conclusion | p. 40 |
Appendix A p. 41 | |
Appendix B p. 42 | |
References | p. 43 |
Chapter 3 Diversity and Accuracy of Data Mining Ensemble | p. 47 |
1 Introduction | p. 47 |
2 Ensemble and Diversity | p. 49 |
2.1 Why needs diversity? | p. 49 |
2.2 Diversity measures | p. 51 |
3 Probability Analysis | p. 52 |
4 Coincident Failure Diversity | p. 52 |
5 Ensemble Accuracy | p. 55 |
5.1 Relationship between random guess and accuracy of lower bound single models | p. 55 |
5.2 Relationship between accuracy A and the number of models N | p. 56 |
5.3 When model's accuracy [Less than] 50% | p. 57 |
6 Construction of Effective Ensembles | p. 58 |
6.1 Strategies for increasing diversity | p. 59 |
6.2 Ensembles of neural networks | p. 60 |
6.3 Ensembles of decision trees | p. 61 |
6.4 Hybrid ensembles | p. 62 |
7 An Application: Osteoporosis Classification Problem | p. 62 |
7.1 Osteoporosis problem | p. 63 |
7.2 Results from the ensembles of neural nets | p. 63 |
7.3 Results from ensembles of the decision trees | p. 66 |
7.4 Results of hybrid ensembles | p. 67 |
8 Discussion and Conclusions | p. 68 |
References | p. 70 |
Chapter 4 Integrated Clustering for Microarray Data | p. 73 |
1 Introduction | p. 73 |
2 Related Work | p. 77 |
3 Data Preprocessing | p. 81 |
4 Integrated Clustering | p. 83 |
4.1 Clustering algorithms | p. 83 |
4.2 Integration methodology | p. 88 |
5 Experimental Evaluation | p. 89 |
5.1 Evaluation methodology | p. 89 |
5.2 Results | p. 91 |
5.3 Discussion | p. 93 |
6 Conclusions | p. 94 |
References | p. 94 |
Chapter 5 Complexity and Synchronization of EEG with Parametric Modeling | p. 99 |
1 Introduction | p. 100 |
1.1 Brief review of EEG recording analysis | p. 100 |
1.2 AR modeling based EEG analysis | p. 101 |
2 TVAR Modeling | p. 104 |
3 Complexity Measure | p. 105 |
4 Synchronization Measure | p. 109 |
5 Conclusions | p. 113 |
References | p. 114 |
Chapter 6 Bayesian Fusion of Syndromic Surveillance with Sensor Data for Disease Outbreak Classification | p. 119 |
1 Introduction | p. 120 |
2 Approach | p. 122 |
2.1 Bayesian belief networks | p. 122 |
2.2 Syndromic data | p. 126 |
2.3 Environmental data | p. 128 |
2.4 Test scenarios | p. 130 |
2.5 Evaluation metrics | p. 130 |
3 Results | p. 131 |
3.1 Scenario 1 | p. 131 |
3.2 Scenario 2 | p. 134 |
3.3 Promptness | p. 135 |
4 Summary and Conclusions | p. 136 |
References | p. 137 |
Chapter 7 An Evaluation of Over-the-Counter Medication Sales for Syndromic Surveillance | p. 143 |
1 Introduction | p. 143 |
2 Background and Related Work | p. 144 |
3 Data | p. 144 |
4 Approaches | p. 145 |
4.1 Lead-lag correlation analysis | p. 145 |
4.2 Regression test of predictive ability | p. 146 |
4.3 Detection-based approaches | p. 148 |
4.4 Supervised algorithm for outbreak detection in OTC data | p. 148 |
4.5 Modified Holt-Winters forecaster | p. 150 |
4.6 Forecasting based on multi-channel regression | p. 151 |
5 Experiments | p. 153 |
5.1 Lead-lag correlation analysis of OTC data | p. 153 |
5.2 Regression test of the predicative value of OTC | p. 154 |
5.3 Results from detection-based approaches | p. 156 |
6 Conclusions and Future Work | p. 158 |
References | p. 159 |
Chapter 8 Collaborative Health Sentinel | p. 163 |
1 Introduction | p. 163 |
2 Infectious Disease and Existing Health Surveillance Programs | p. 166 |
3 Elements of the Collaborative Health Sentinel (CHS) System | p. 170 |
3.1 Sampling | p. 170 |
3.2 Creating a national health map | p. 177 |
3.3 Detection | p. 177 |
3.4 Reaction | p. 183 |
3.5 Cost considerations | p. 184 |
4 Interaction with the Health Information Technology (HCIT) World | p. 185 |
5 Conclusion | p. 188 |
References | p. 189 |
Appendix A HL7 | p. 192 |
Chapter 9 A Multi-Modal System Approach for Drug Abuse Research and Treatment Evaluation: Information Systems Needs and Challenges | p. 195 |
1 Introduction | p. 195 |
2 Context | p. 198 |
2.1 Data sources | p. 198 |
2.2 Examples of relevant questions | p. 199 |
3 Possible System Structure | p. 201 |
4 Challenges in System Development and Implementation | p. 204 |
4.1 Ontology development | p. 204 |
4.2 Data source control, proprietary issues | p. 205 |
4.3 Privacy, security issues | p. 205 |
4.4 Costs to implement/maintain system | p. 206 |
4.5 Historical hypothesis-testing paradigm | p. 206 |
4.6 Utility, usability, credibility of such a system | p. 206 |
4.7 Funding of system development | p. 207 |
5 Summary | p. 207 |
References | p. 208 |
Chapter 10 Knowledge Representation for Versatile Hybrid Intelligent Processing Applied in Predictive Toxicology | p. 213 |
1 Introduction | p. 214 |
2 Hybrid Intelligent Techniques for Predictive Toxicology Knowledge Representation | p. 217 |
3 XML Schemas for Knowledge Representation and Processing in AI and Predictive Toxicology | p. 218 |
4 Towards a Standard for Chemical Data Representation in Predictive Toxicology | p. 220 |
5 Hybrid Intelligent Systems for Knowledge Representation in Predictive Toxicology | p. 225 |
5.1 A formal description of implicit and explicit knowledge-based intelligent systems | p. 226 |
5.2 An XML schema for hybrid intelligent systems | p. 228 |
6 A Case Study | p. 231 |
6.1 Materials and methods | p. 232 |
6.2 Results | p. 233 |
7 Conclusions | p. 235 |
References | p. 236 |
Chapter 11 Ensemble Classification System Implementation for Biomedical Microarray Data | p. 239 |
1 Introduction | p. 240 |
2 Background | p. 241 |
2.1 Reasons for ensemble | p. 241 |
2.2 Diversity and ensemble | p. 241 |
2.3 Relationship between measures of diversity and combination method | p. 243 |
2.4 Measures of diversity | p. 243 |
2.5 Microarray data | p. 244 |
3 Ensemble Classification System (ECS) Design | p. 245 |
3.1 ECS overview | p. 245 |
3.2 Feature subset selection | p. 247 |
3.3 Base classifiers | p. 248 |
3.4 Combination strategy | p. 249 |
4 Experiments | p. 250 |
4.1 Experimental datasets | p. 250 |
4.2 Experimental results | p. 252 |
5 Conclusion and Further Work | p. 254 |
References | p. 255 |
Chapter 12 An Automated Method for Cell Phase Identification in High Throughput Time-Lapse Screens | p. 257 |
1 Introduction | p. 258 |
2 Nuclei Segmentation and Tracking | p. 259 |
3 Cell Phase Identification | p. 260 |
3.1 Feature calculation | p. 260 |
3.2 Identifying cell phase | p. 262 |
3.3 Correcting cell phase identification errors | p. 265 |
4 Experimental Results | p. 266 |
5 Conclusion | p. 272 |
References | p. 272 |
Chapter 13 Inference of Transcriptional Regulatory Networks Based on Cancer Microarray Data | p. 275 |
1 Introduction | p. 275 |
2 Subnetworks and Transcriptional Regulatory Networks Inference | p. 277 |
2.1 Inferring subnetworks using z-score | p. 277 |
2.2 Inferring subnetworks based on graph theory | p. 278 |
2.3 Inferring subnetworks based on Bayesian networks | p. 279 |
2.4 Inferring transcriptional regulatory networks based on integrated expression and sequence data | p. 283 |
3 Multinomial Probit Regression with Baysian Gene Selection | p. 284 |
3.1 Problem formulation | p. 284 |
3.2 Bayesian variable selection | p. 286 |
3.3 Bayesian estimation using the strongest genes | p. 288 |
3.4 Experimental results | p. 289 |
4 Network Construction Based on Clustering and Predictor Design | p. 293 |
4.1 Predictor construction using reversible jump MCMC annealing | p. 293 |
4.2 CoD for predictors | p. 295 |
4.3 Experimental results on a Myeloid line | p. 296 |
5 Concluding Remarks | p. 298 |
References | p. 299 |
Chapter 14 Data Mining in Biomedicine | p. 305 |
1 Introduction | p. 305 |
2 Predictive Model Construction | p. 306 |
2.1 Derivation of unsupervised models | p. 307 |
2.2 Derivation of supervised models | p. 311 |
3 Validation | p. 316 |
4 Impact Analysis | p. 318 |
5 Summary | p. 319 |
References | p. 319 |
Chapter 15 Mining Multilevel Association Rules from Gene Ontology and Microarray Data | p. 321 |
1 Introduction | p. 321 |
2 Proposed Methods | p. 323 |
2.1 Preprocessing | p. 323 |
2.2 Hierarchy-information encoding | p. 324 |
3 The MAGO Algorithm | p. 326 |
3.1 MAGO algorithm | p. 327 |
3.2 CMAGO (Constrained Multilevel Association rules with Gene Ontology) | p. 329 |
4 Experimental Results | p. 330 |
4.1 The characteristic of the dataset | p. 331 |
4.2 Experimental results | p. 331 |
4.3 Interpretation | p. 334 |
5 Concluding Remarks | p. 335 |
References | p. 336 |
Chapter 16 A Proposed Sensor-Configuration and Sensitivity Analysis of Parameters with Applications to Biosensors | p. 339 |
1 Introduction | p. 340 |
2 Sensor-System Configuration | p. 342 |
3 Optical Biosensors | p. 346 |
3.1 Relationship between parameters | p. 347 |
3.2 Modelling of parameters | p. 351 |
4 Discussion | p. 356 |
Conclusion | p. 358 |
References | p. 359 |
Epilogue | p. 361 |
References | p. 364 |
Index | p. 365 |