Cover image for Parallel computing for bioinformatics and computational biology : models, enabling technologies, and case studies
Title:
Parallel computing for bioinformatics and computational biology : models, enabling technologies, and case studies
Publication Information:
Hoboken , NJ : John Wiley & Sons, 2006
ISBN:
9780471718482
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010113134 QH324.2 P37 2006 Open Access Book Book
Searching...

On Order

Summary

Summary

Discover how to streamline complex bioinformatics applications with parallel computing


This publication enables readers to handle more complex bioinformatics applications and larger and richer data sets. As the editor clearly shows, using powerful parallel computing tools can lead to significant breakthroughs in deciphering genomes, understanding genetic disease, designing customized drug therapies, and understanding evolution.

A broad range of bioinformatics applications is covered with demonstrations on how each one can be parallelized to improve performance and gain faster rates of computation. Current parallel computing techniques and technologies are examined, including distributed computing and grid computing. Readers are provided with a mixture of algorithms, experiments, and simulations that provide not only qualitative but also quantitative insights into the dynamic field of bioinformatics.

Parallel Computing for Bioinformatics and Computational Biology is a contributed work that serves as a repository of case studies, collectively demonstrating how parallel computing streamlines difficult problems in bioinformatics and produces better results. Each of the chapters is authored by an established expert in the field and carefully edited to ensure a consistent approach and high standard throughout the publication.

The work is organized into five parts:
* Algorithms and models
* Sequence analysis and microarrays
* Phylogenetics
* Protein folding
* Platforms and enabling technologies

Researchers, educators, and students in the field of bioinformatics will discover how high-performance computing can enable them to handle more complex data sets, gain deeper insights, and make new discoveries.


Author Notes

ALBERT Y. ZOMAYA is the CISCO Systems Chair Professor of Internetworking, School of Information Technologies, The University of Sydney, and Deputy Director for Information Technology of the Sydney University Biological Informatics and Technology Centre. Professor Zomaya has been the Chair of the IEEE Technical Committee on Parallel Processing and has been awarded the IEEE Computer Society's Meritorious Service Award. He is an IEEE fellow.


Reviews 1

Choice Review

Data is spewing forth from biological labs at an incredible yet accelerating rate. A lab book and a pen might once have been sufficient for experiments that took months or years for an outcome, but new methods are needed for this Niagara of data. Bioinformatics is the attempt to use techniques from computer and information science to manage and analyze this large amount of complex data. Computational biology uses mathematical modeling and computational simulation techniques to study biological systems. Whether or not there is a worthwhile distinction between these areas, clearly they both need to use the latest and best in computer technology. This book discusses the uses of parallel computing in these areas from a variety of viewpoints. Some chapters focus on the biology and mathematical models, others look at algorithms and programming languages, and some look at the variety of parallel architectures. As with any multi-authored volume, there is unevenness across chapters, but this book offers a good overview of the current state of computing in these areas. ^BSumming Up: Recommended. Graduate students through professionals. P. Cull Oregon State University


Table of Contents

Nouhad J. RizkJack da SilvaChristophe ChassagnoleK. BurrageNing KangAzzedine BoukercheBertil SchmidtAmitava DattaXue WuVipin ChaudharyRobert L. MartinoGabriel LuqueMohammed KhabzaouiAlexandros StamatakisEkkehard PetzoldTiffani L. WilliamsRuhong ZhouR. AndonovRichard O. DayAli Al MazariShahid H. BokhariKim K. BaldridgeArun KrishnanRuss MillerRajendra R. JoshiHans De SterckArjav J. ChakravartiH. SimmlerT. Pan
Prefacep. xv
Contributorsp. xxv
Acknowledgmentsp. xxix
Part I Algorithms and Modelsp. 1
1 Parallel and Evolutionary Approaches to Computational Biologyp. 3
1.1 Introductionp. 4
1.2 Bioinformaticsp. 13
1.3 Evolutionary Computation Applied to Computational Biologyp. 20
1.4 Conclusionsp. 23
Referencesp. 25
2 Parallel Monte Carlo Simulation of HIV Molecular Evolution in Response to Immune Surveillancep. 29
2.1 Introductionp. 30
2.2 The Problemp. 30
2.3 The Modelp. 32
2.4 Parallelization with MPIp. 39
2.5 Parallel Random Number Generationp. 43
2.6 Preliminary Simulation Resultsp. 46
2.7 Future Directionsp. 52
Referencesp. 55
3 Differential Evolutionary Algorithms for In Vivo Dynamic Analysis of Glycolysis and Pentose Phosphate Pathway in Escherichia colip. 59
3.1 Introductionp. 59
3.2 Mathematical Modelp. 61
3.3 Estimation of the Parameters of the Modelp. 67
3.4 Kinetic Parameter Estimation by DEp. 69
3.5 Simulation and Resultsp. 70
3.6 Stability Analysisp. 73
3.7 Control Characteristicp. 73
3.8 Conclusionsp. 75
Referencesp. 76
4 Compute-Intensive Simulations for Cellular Modelsp. 79
4.1 Introductionp. 79
4.2 Simulation Methods for Stochastic Chemical Kineticsp. 81
4.3 Aspects of Biology - Genetic Regulationp. 92
4.4 Parallel Computing for Biological Systemsp. 96
4.5 Parallel Simulationsp. 100
4.6 Spatial Modeling of Cellular Systemsp. 104
4.7 Modeling Colonies of Cellsp. 109
Referencesp. 115
5 Parallel Computation in Simulating Diffusion and Deformation in Human Brainp. 121
5.1 Introductionp. 121
5.2 Anisotropic Diffusion Simulation in White Matter Tractographyp. 122
5.3 Brain Deformation Simulation in Image-Guided Neurosurgeryp. 132
5.4 Summaryp. 142
Referencesp. 143
Part II Sequence Analysis and Microarraysp. 147
6 Computational Molecular Biologyp. 149
6.1 Introductionp. 149
6.2 Basic Concepts in Molecular Biologyp. 150
6.3 Global and Local Biological Sequence Alignmentp. 152
6.4 Heuristic Approaches for Biological Sequence Comparisonp. 158
6.5 Parallel and Distributed Sequence Comparisonp. 161
6.6 Conclusionsp. 164
Referencesp. 165
7 Special-Purpose Computing for Biological Sequence Analysisp. 167
7.1 Introductionp. 167
7.2 Hybrid Parallel Computerp. 169
7.3 Dynamic Programming Communication Patternp. 172
7.4 Performance Evaluationp. 179
7.5 Future Work and Open Problemsp. 185
7.6 Tutorialp. 188
Referencesp. 190
8 Multiple Sequence Alignment in Parallel on a Cluster of Workstationsp. 193
8.1 Introductionp. 193
8.2 CLUSTAL Wp. 194
8.3 Implementationp. 201
8.4 Resultsp. 207
8.5 Conclusionp. 209
Referencesp. 210
9 Searching Sequence Databases Using High-Performance BLASTsp. 211
9.1 Introductionp. 211
9.2 Basic Blast Algorithmp. 212
9.3 Blast Usage and Performance Factorsp. 214
9.4 High Performance BLASTsp. 215
9.5 Comparing BLAST Performancep. 221
9.6 UMD-BLASTp. 226
9.7 Future Directionsp. 228
9.8 Related Workp. 229
9.9 Summaryp. 230
Referencesp. 230
10 Parallel Implementations of Local Sequence Alignment: Hardware and Softwarep. 233
10.1 Introductionp. 233
10.2 Sequence Alignment Primerp. 235
10.3 Smith-Waterman Algorithmp. 240
10.4 FASTAp. 244
10.5 BLASTp. 245
10.6 HMMER - Hidden Markov Modelsp. 249
10.7 ClustalWp. 252
10.8 Specialized Hardware: FPGAp. 257
10.9 Conclusionp. 262
Referencesp. 262
11 Parallel Computing in the Analysis of Gene Expression Relationshipsp. 265
11.1 Significance of Gene Expression Analysisp. 265
11.2 Multivariate Gene Expression Relationsp. 267
11.3 Classification Based on Gene Expressionp. 274
11.4 Discussion and Future Directionsp. 280
Referencesp. 282
12 Assembling DNA Fragments with a Distributed Genetic Algorithmp. 285
12.1 Introductionp. 285
12.2 DNA Fragment Assembly Problemp. 286
12.3 DNA Fragment Assembly Using the Sequential GAp. 289
12.4 DNA Fragment Assembly Problem Using the Parallel GAp. 292
12.5 Experimental Resultsp. 294
12.6 Conclusionsp. 301
Referencesp. 301
13 A Cooperative Genetic Algorithm for Knowledge Discovery in Microarray Experimentsp. 303
13.1 Introductionp. 303
13.2 Microarray Experimentsp. 304
13.3 Association Rulesp. 306
13.4 Multi-Objective Genetic Algorithmp. 308
13.5 Cooperative Multi-Objective Genetic Algorithm (PMGA)p. 313
13.6 Experimentsp. 315
13.7 Conclusionp. 322
Referencesp. 322
Part III Phylogeneticsp. 325
14 Parallel and Distributed Computation of Large Phylogenetic Treesp. 327
14.1 Introductionp. 327
14.2 Maximum Likelihoodp. 330
14.3 State-of-the-Art ML Programsp. 332
14.4 Algorithmic Solutions in RAxML-IIIp. 334
14.5 HPC Solutions in RAxML-IIIp. 337
14.6 Future Developmentsp. 341
Referencesp. 344
15 Phylogenetic Parameter Estimation on COWsp. 347
15.1 Introductionp. 347
15.2 Phylogenetic Tree Reconstruction using Quartet Puzzlingp. 349
15.3 Hardware, Data, and Scheduling Algorithmsp. 354
15.4 Parallelizing PEstp. 356
15.5 Extending Parallel Coverage in PEstp. 359
15.6 Discussionp. 365
Referencesp. 367
16 High-Performance Phylogeny Reconstruction Under Maximum Parsimonyp. 369
16.1 Introductionp. 369
16.2 Maximum Parsimonyp. 374
16.3 Exact MP: Parallel Branch and Boundp. 378
16.4 MP Heuristics: Disk-Covering Methodsp. 381
16.5 Summary and Open Problemsp. 390
Referencesp. 392
Part IV Protein Foldingp. 395
17 Protein Folding with the Parallel Replica Exchange Molecular Dynamics Methodp. 397
17.1 Introductionp. 397
17.2 REMD Methodp. 399
17.3 Protein Folding with REMDp. 403
17.4 Protein Structure Refinement with REMDp. 420
17.5 Summaryp. 422
Referencep. 423
18 High-Performance Alignment Methods for Protein Threadingp. 427
18.1 Introductionp. 427
18.2 Formal Definitionp. 431
18.3 Mixed Integer Programming Modelsp. 434
18.4 Divide-and-Conquer Techniquep. 444
18.5 Parallelizationp. 448
18.6 Future Research Directionsp. 453
18.7 Conclusionp. 454
18.8 Summaryp. 454
Referencesp. 455
19 Parallel Evolutionary Computations in Discerning Protein Structuresp. 459
19.1 Introductionp. 459
19.2 PSP Problemp. 460
19.3 Protein Structure Discerning Methodsp. 461
19.4 PSP Energy Minimization EAsp. 471
19.5 PSP Parallel EA Performance Evaluationp. 477
19.6 Results and Discussionp. 479
19.7 Conclusions and Suggested Researchp. 483
Referencesp. 483
Part V Platforms and Enabling Technologiesp. 487
20 A Brief Overview of Grid Activities for Bioinformatics and Health Applicationsp. 489
20.1 Introductionp. 489
20.2 Grid Computingp. 490
20.3 Bioinformatics and Health Applicationsp. 491
20.4 Grid Computing for Bioinformatics and Health Applicationsp. 491
20.5 Grid Activities in Europep. 492
20.6 Grid Activities in the United Kingdomp. 494
20.7 Grid Activities in the USAp. 497
20.8 Grid Activities in Asia and Japanp. 498
20.9 International Grid Collaborationsp. 499
20.10 International Grid Collaborationsp. 499
20.11 Conclusions and Future Trendsp. 500
Referencesp. 501
21 Parallel Algorithms for Bioinformaticsp. 509
21.1 Introductionp. 509
21.2 Parallel Computer Architecturep. 511
21.3 Bioinformatics Algorithms on the Cray MTA Systemp. 517
21.4 Summaryp. 527
Referencesp. 528
22 Cluster and Grid Infrastructure for Computational Chemistry and Biochemistryp. 531
22.1 Introductionp. 531
22.2 GAMESS Execution on Clustersp. 532
22.3 Portal Technologyp. 537
22.4 Running GAMESS with Nimrod Grid-Enabling Infrastructurep. 538
22.5 Computational Chemistry Workflow Environmentsp. 542
22.6 Conclusionsp. 546
Referencesp. 548
23 Distributed Workflows in Bioinformaticsp. 551
23.1 Introductionp. 551
23.2 Challenges of Grid Computingp. 553
23.3 Grid Applicationsp. 554
23.4 Grid Programmingp. 555
23.5 Grid Execution Languagep. 557
23.6 GUI-Based Workflow Construction and Executionp. 565
23.7 Case Studiesp. 570
23.8 Summaryp. 578
Referencesp. 579
24 Molecular Structure Determination on a Computational and Data Gridp. 583
24.1 Introductionp. 583
24.2 Molecular Structure Determinationp. 585
24.3 Grid Computing in Buffalop. 586
24.4 Center for Computational Researchp. 588
24.5 ACDC-Grid Overviewp. 588
24.6 Grid Research Collaborationsp. 596
24.7 Grid Research Advancementsp. 601
24.8 Grid Research Application Abstractions and Toolsp. 603
24.9 Conclusionsp. 616
Referencesp. 616
25 GIPSY: A Problem-Solving Environment for Bioinformatics Applicationsp. 623
25.1 Introductionp. 623
25.2 Architecturep. 626
25.3 Currently Deployed Applicationsp. 634
25.4 Conclusionp. 647
Referencesp. 648
26 TaskSpaces: A Software Framework for Parallel Bioinformatics on Computational Gridsp. 651
26.1 Introductionp. 651
26.2 The TaskSpaces Frameworkp. 655
26.3 Application: Finding Correctly Folded RNA Motifsp. 661
26.4 Case Study: Operating the Framework on a Computational Gridp. 663
26.5 Results for the RNA Motif Problemp. 664
26.6 Future Workp. 668
26.7 Summary and Conclusionp. 669
Referencesp. 669
27 The Organic Grid: Self-Organizing Computational Biology on Desktop Gridsp. 671
27.1 Introductionp. 672
27.2 Background and Related Workp. 674
27.3 Measurementsp. 686
27.4 Conclusionsp. 698
27.5 Future Directionsp. 699
Referencesp. 700
28 FPGA Computing in Modern Bioinformaticsp. 705
28.1 Parallel Processing Modelsp. 706
28.2 Image Processing Taskp. 708
28.3 FPGA Hardware Acceleratorsp. 711
28.4 Image Processing Examplep. 716
28.5 Case Study: Protein Structure Predictionp. 720
28.6 Conclusionp. 733
Referencesp. 734
29 Virtual Microscopy: Distributed Image Storage, Retrieval, Analysis, and Visualizationp. 737
29.1 Introductionp. 737
29.2 Architecturep. 738
29.3 Image Analysisp. 747
29.4 Clinical Usep. 752
29.5 Educationp. 755
29.6 Future Directionsp. 756
29.7 Summaryp. 759
Referencesp. 760
Indexp. 765