Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010200260 | QD462.6.D38 J36 2008 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
An In-Depth View of Hardware Issues, Programming Practices, and Implementation of Key Methods
Exploring the challenges of parallel programming from the perspective of quantum chemists, Parallel Computing in Quantum Chemistry thoroughly covers topics relevant to designing and implementing parallel quantum chemistry programs.
Focusing on good parallel program design and performance analysis, the first part of the book deals with parallel computer architectures and parallel computing concepts and terminology. The authors discuss trends in hardware, methods, and algorithms; parallel computer architectures and the overall system view of a parallel computer; message-passing; parallelization via multi-threading; measures for predicting and assessing the performance of parallel algorithms; and fundamental issues of designing and implementing parallel programs.
The second part contains detailed discussions and performance analyses of parallel algorithms for a number of important and widely used quantum chemistry procedures and methods. The book presents schemes for the parallel computation of two-electron integrals, details the Hartree-Fock procedure, considers the parallel computation of second-order Møller-Plesset energies, and examines the difficulties of parallelizing local correlation methods.
Through a solid assessment of parallel computing hardware issues, parallel programming practices, and implementation of key methods, this invaluable book enables readers to develop efficient quantum chemistry software capable of utilizing large-scale parallel computers.
Author Notes
Janssen, Curtis L.; Nielsen, Ida M. B.
Table of Contents
I Parallel Computing Concepts and Terminology | |
1 Introduction | p. 3 |
1.1 Parallel Computing in Quantum Chemistry: Past and Present | p. 4 |
1.2 Trends in Hardware Development | p. 5 |
1.2.1 Moore's Law | p. 5 |
1.2.2 Clock Speed and Performance | p. 6 |
1.2.3 Bandwidth and Latency | p. 7 |
1.2.4 Supercomputer Performance | p. 8 |
1.3 Trends in Parallel Software Development | p. 10 |
1.3.1 Responding to Changes in Hardware | p. 10 |
1.3.2 New Algorithms and Methods | p. 10 |
1.3.3 New Programming Models | p. 12 |
References | p. 13 |
2 Parallel Computer Architectures | p. 17 |
2.1 Flynn's Classification Scheme | p. 17 |
2.1.1 Single-Instruction, Single-Data | p. 17 |
2.1.2 Single-Instruction, Multiple-Data | p. 18 |
2.1.3 Multiple-Instruction, Multiple-Data | p. 18 |
2.2 Network Architecture | p. 19 |
2.2.1 Direct and Indirect Networks | p. 19 |
2.2.2 Routing | p. 20 |
2.2.3 Network Performance | p. 23 |
2.2.4 Network Topology | p. 25 |
2.2.4.1 Crossbar | p. 26 |
2.2.4.2 Ring | p. 27 |
2.2.4.3 Mesh and Torus | p. 27 |
2.2.4.4 Hypercube | p. 28 |
2.2.4.5 Fat Tree | p. 28 |
2.2.4.6 Bus | p. 30 |
2.2.4.7 Ad Hoc Grid | p. 31 |
2.3 Node Architecture | p. 31 |
2.4 MIMD System Architecture | p. 34 |
2.4.1 Memory Hierarchy | p. 35 |
2.4.2 Persistent Storage | p. 35 |
2.4.2.1 Local Storage | p. 37 |
2.4.2.2 Network Storage | p. 37 |
2.4.2.3 Trends in Storage | p. 38 |
2.4.3 Reliability | p. 38 |
2.4.4 Homogeneity and Heterogeneity | p. 39 |
2.4.5 Commodity versus Custom Computers | p. 40 |
2.5 Further Reading | p. 42 |
References | p. 43 |
3 Communication via Message-Passing | p. 45 |
3.1 Point-to-Point Communication Operations | p. 46 |
3.1.1 Blocking Point-to-Point Operations | p. 46 |
3.1.2 Non-Blocking Point-to-Point Operations | p. 47 |
3.2 Collective Communication Operations | p. 49 |
3.2.1 One-to-All Broadcast | p. 50 |
3.2.2 All-to-All Broadcast | p. 51 |
3.2.3 All-to-One Reduction and All-Reduce | p. 54 |
3.3 One-Sided Communication Operations | p. 55 |
3.4 Further Reading | p. 56 |
References | p. 56 |
4 Multi-Threading | p. 59 |
4.1 Pitfalls of Multi-Threading | p. 61 |
4.2 Thread-Safety | p. 64 |
4.3 Comparison of Multi-Threading and Message-Passing | p. 65 |
4.4 Hybrid Programming | p. 66 |
4.5 Further Reading | p. 69 |
References | p. 70 |
5 Parallel Performance Evaluation | p. 71 |
5.1 Network Performance Characteristics | p. 71 |
5.2 Performance Measures for Parallel Programs | p. 74 |
5.2.1 Speedup and Efficiency | p. 74 |
5.2.2 Scalability | p. 79 |
5.3 Performance Modeling | p. 80 |
5.3.1 Modeling the Execution Time | p. 80 |
5.3.2 Performance Model Example: Matrix-Vector Multiplication | p. 83 |
5.4 Presenting and Evaluating Performance Data: A Few Caveats | p. 86 |
5.5 Further Reading | p. 90 |
References | p. 90 |
6 Parallel Program Design | p. 93 |
6.1 Distribution of Work | p. 94 |
6.1.1 Static Task Distribution | p. 95 |
6.1.1.1 Round-Robin and Recursive Task Distributions | p. 96 |
6.1.2 Dynamic Task Distribution | p. 99 |
6.1.2.1 Manager-Worker Model | p. 99 |
6.1.2.2 Decentralized Task Distribution | p. 101 |
6.2 Distribution of Data | p. 101 |
6.3 Designing a Communication Scheme | p. 104 |
6.3.1 Using Collective Communication | p. 104 |
6.3.2 Using Point-to-Point Communication | p. 105 |
6.4 Design Example: Matrix-Vector Multiplication | p. 107 |
6.4.1 Using a Row-Distributed Matrix | p. 108 |
6.4.2 Using a Block-Distributed Matrix | p. 109 |
6.5 Summary of Key Points of Parallel Program Design | p. 112 |
6.6 Further Reading | p. 114 |
References | p. 114 |
II Applications of Parallel Programming in Quantum Chemistry | |
7 Two-Electron Integral Evaluation | p. 117 |
7.1 Basics of Integral Computation | p. 117 |
7.2 Parallel Implementation Using Static Load Balancing | p. 119 |
7.2.1 Parallel Algorithms Distributing Shell Quartets and Pairs | p. 119 |
7.2.2 Performance Analysis | p. 121 |
7.2.2.1 Determination of the Load Imbalance Factor k(p) | p. 122 |
7.2.2.2 Determination of [mu] and [sigma] for Integral Computation | p. 123 |
7.2.2.3 Predicted and Measured Efficiencies | p. 124 |
7.3 Parallel Implementation Using Dynamic Load Balancing | p. 125 |
7.3.1 Parallel Algorithm Distributing Shell Pairs | p. 126 |
7.3.2 Performance Analysis | p. 128 |
7.3.2.1 Load Imbalance | p. 128 |
7.3.2.2 Communication Time | p. 128 |
7.3.2.3 Predicted and Measured Efficiencies | p. 129 |
References | p. 130 |
8 The Hartree-Fock Method | p. 131 |
8.1 The Hartree-Fock Equations | p. 131 |
8.2 The Hartree-Fock Procedure | p. 133 |
8.3 Parallel Fock Matrix Formation with Replicated Data | p. 135 |
8.4 Parallel Fock Matrix Formation with Distributed Data | p. 138 |
8.5 Further Reading | p. 145 |
References | p. 146 |
9 Second-Order Moller-Plesset Perturbation Theory | p. 147 |
9.1 The Canonical MP2 Equations | p. 147 |
9.2 A Scalar Direct MP2 Algorithm | p. 149 |
9.3 Parallelization with Minimal Modifications | p. 151 |
9.4 High-Performance Parallelization | p. 154 |
9.5 Performance of the Parallel Algorithms | p. 158 |
9.6 Further Reading | p. 164 |
References | p. 164 |
10 Local Moller-Plesset Perturbation Theory | p. 167 |
10.1 The LMP2 Equations | p. 167 |
10.2 A Scalar LMP2 Algorithm | p. 169 |
10.3 Parallel LMP2 | p. 170 |
10.3.1 Two-Electron Integral Transformation | p. 171 |
10.3.2 Computation of the Residual | p. 173 |
10.3.3 Parallel Performance | p. 174 |
References | p. 177 |
Appendices | |
A A Brief Introduction to MPI | p. 181 |
B Pthreads: Explicit Use of Threads | p. 189 |
C OpenMP: Compiler Extensions for Multi-Threading | p. 195 |
Index | p. 205 |