Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010197186 | QA76.642 C39 2009 | Open Access Book | Book | Searching... |
Searching... | 30000003503780 | QA76.642 C39 2009 | Open Access Book | Book | Searching... |
Searching... | 30000003503749 | QA76.642 C39 2009 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Focusing on algorithms for distributed-memory parallel architectures, Parallel Algorithms presents a rigorous yet accessible treatment of theoretical models of parallel computation, parallel algorithm design for homogeneous and heterogeneous platforms, complexity and performance analysis, and essential notions of scheduling. The book extracts fundamental ideas and algorithmic principles from the mass of parallel algorithm expertise and practical implementations developed over the last few decades.
In the first section of the text, the authors cover two classical theoretical models of parallel computation (PRAMs and sorting networks), describe network models for topology and performance, and define several classical communication primitives. The next part deals with parallel algorithms on ring and grid logical topologies as well as the issue of load balancing on heterogeneous computing platforms. The final section presents basic results and approaches for common scheduling problems that arise when developing parallel algorithms. It also discusses advanced scheduling topics, such as divisible load scheduling and steady-state scheduling.
With numerous examples and exercises in each chapter, this text encompasses both the theoretical foundations of parallel algorithms and practical parallel algorithm design.
Author Notes
Henri Casanova, Arnaud Legran, Yves Robert
Table of Contents
Preface | p. xiii |
I Models | p. 1 |
1 PRAM Model | p. 3 |
1.1 Pointer Jumping | p. 4 |
1.1.1 List Ranking | p. 5 |
1.1.2 Prefix Computation | p. 7 |
1.1.3 Euler Tour | p. 8 |
1.2 Performance Evaluation of PRAM Algorithms | p. 10 |
1.2.1 Cost, Work, Speedup, and Efficiency | p. 10 |
1.2.2 A Simple Simulation Result | p. 10 |
1.2.3 Brent's Theorem | p. 12 |
1.3 Comparison of PRAM Models | p. 12 |
1.3.1 Model Separation | p. 13 |
1.3.2 Simulation Theorem | p. 14 |
1.4 Sorting Machine | p. 15 |
1.4.1 Merge | p. 15 |
1.4.2 Sorting Trees | p. 18 |
1.4.3 Complexity and Correctness | p. 20 |
1.5 Relevance of the PRAM Model | p. 24 |
1.6 Exercises | p. 26 |
Matrix Multiplication | p. 26 |
Selection in a List | p. 26 |
Splitting an Array | p. 26 |
Looking for Roots in a Forest | p. 26 |
First Non-Zero Element | p. 27 |
Mystery Function | p. 27 |
Connected Components | p. 28 |
1.7 Answers | p. 30 |
2 Sorting Networks | p. 37 |
2.1 Odd-Even Merge Sort | p. 37 |
2.1.1 Odd-Even Merging Network | p. 38 |
2.1.2 Sorting Network | p. 41 |
2.1.3 0-1 Principle | p. 42 |
2.2 Sorting on a One-Dimensional Network | p. 44 |
2.2.1 Odd-Even Transposition Sort | p. 44 |
2.2.2 Odd-Even Sorting on a One-Dimensional Network | p. 47 |
2.3 Exercises | p. 49 |
Particular Sequences | p. 49 |
Bitonic Sorting Network | p. 49 |
Sorting on a Two-Dimensional Grid | p. 49 |
2.4 Answers | p. 52 |
3 Networking | p. 57 |
3.1 Interconnection Networks | p. 57 |
3.1.1 Topologies | p. 58 |
3.1.2 A Few Static Topologies | p. 59 |
3.1.3 Dynamic Topologies | p. 61 |
3.2 Communication Model | p. 62 |
3.2.1 A Simple Performance Model | p. 63 |
3.2.2 Point-to-Point Communication Protocols | p. 63 |
3.2.3 More Precise Models | p. 66 |
3.3 Case Study: The Unidirectional Ring | p. 72 |
3.3.1 Broadcast | p. 74 |
3.3.2 Scatter | p. 76 |
3.3.3 All-to-All | p. 77 |
3.3.4 Pipelined Broadcast | p. 78 |
3.4 Case Study: The Hypercube | p. 79 |
3.4.1 Labeling Vertices | p. 79 |
3.4.2 Paths and Routing in a Hypercube | p. 80 |
3.4.3 Embedding Rings and Grids into Hypercubes | p. 82 |
3.4.4 Collective Communications in a Hypercube | p. 83 |
3.5 Peer-to-Peer Computing | p. 87 |
3.5.1 Distributed Hash Tables and Structured Overlay Networks | p. 88 |
3.5.2 Chord | p. 90 |
3.5.3 Plaxton Routing Algorithm | p. 91 |
3.5.4 Multi-Casting in a Distributed Hash Table | p. 91 |
3.6 Exercises | p. 93 |
Torus, Hypercubes, and Binary Trees | p. 93 |
Torus, Hypercubes, and Binary Trees (again!) | p. 93 |
Cube-Connected Cycles | p. 93 |
Matrix Transposition | p. 94 |
Dynamically Switched Networks | p. 94 |
De Bruijn Network | p. 95 |
Gossip | p. 96 |
3.7 Answers | p. 97 |
II Parallel Algorithms | p. 103 |
4 Algorithms on a Ring of Processors | p. 105 |
4.1 Matrix-Vector Multiplication | p. 105 |
4.2 Matrix-Matrix Multiplication | p. 110 |
4.3 A First Look at Stencil Applications | p. 112 |
4.3.1 A Simple Sequential Stencil Algorithm | p. 112 |
4.3.2 Parallelizations of the Stencil Algorithm | p. 113 |
4.4 LU Factorization | p. 120 |
4.4.1 First Version | p. 120 |
4.4.2 Pipelining on the Ring | p. 126 |
4.4.3 Look-Ahead Algorithm | p. 128 |
4.5 A Second Look at Stencil Applications | p. 129 |
4.5.1 Parallelization on a Unidirectional Ring | p. 130 |
4.5.2 Parallelization on a Bidirectional Ring | p. 134 |
4.6 Implementing Logical Topologies | p. 135 |
4.7 Distributed vs. Centralized Implementations | p. 136 |
4.8 Summary of Algorithmic Principles | p. 138 |
4.9 Exercises | p. 139 |
Solving a Triangular System | p. 139 |
Givens Rotation | p. 139 |
Parallel Speedup | p. 140 |
MPI Matrix-Matrix Multiplication | p. 140 |
4.10 Answers | p. 142 |
5 Algorithms on Grids of Processors | p. 147 |
5.1 Logical Two-Dimensional Grid Topologies | p. 147 |
5.2 Communication on a Grid of Processors | p. 149 |
5.3 Matrix Multiplication on a Grid of Processors | p. 150 |
5.3.1 The Outer-Product Algorithm | p. 151 |
5.3.2 Grid vs. Ring? | p. 154 |
5.3.3 Three Matrix Multiplication Algorithms | p. 156 |
5.3.4 Performance Analysis of the Three Algorithms | p. 161 |
5.4 Two-Dimensional Block Cyclic Data Distribution | p. 167 |
5.5 Exercises | p. 169 |
Matrix Transposition | p. 169 |
Stencil Application | p. 169 |
Gauss-Jordan Method | p. 169 |
Outer-Product Algorithm in MPI | p. 169 |
5.6 Answers | p. 171 |
6 Load Balancing on Heterogeneous Platforms | p. 175 |
6.1 Load Balancing for One-Dimensional Data Distributions | p. 176 |
6.1.1 Basic Static Task Allocation Algorithm | p. 177 |
6.1.2 Incremental Algorithm | p. 179 |
6.1.3 Application to a Stencil Application | p. 181 |
6.1.4 Application to the LU Factorization | p. 184 |
6.2 Load Balancing for Two-Dimensional Data Distributions | p. 186 |
6.2.1 Matrix Multiplication on a Heterogeneous Grid | p. 186 |
6.2.2 On the Hardness of the Two-Dimensional Data Partitioning Problem | p. 189 |
6.3 Free Two-Dimensional Partitioning on a Heterogeneous Grid | p. 190 |
6.3.1 General Problem | p. 190 |
6.3.2 A Guaranteed Heuristic | p. 194 |
6.4 Exercises | p. 201 |
Matrix Product on a Heterogeneous Ring | p. 201 |
LU Factorization on a Heterogeneous Grid | p. 201 |
Stencil Application on a Heterogeneous Ring | p. 202 |
6.5 Answers | p. 203 |
III Scheduling | p. 207 |
7 Scheduling | p. 209 |
7.1 Introduction | p. 209 |
7.1.1 Where Do Task Graphs Come From? | p. 209 |
7.1.2 Overview | p. 211 |
7.2 Scheduling Task Graphs | p. 212 |
7.3 Solving Pb([infinity]) | p. 216 |
7.4 Solving Pb(p) | p. 218 |
7.4.1 NP-completeness of Pb(p) | p. 218 |
7.4.2 List Scheduling Heuristics | p. 220 |
7.4.3 Implementing a List Schedule | p. 223 |
7.4.4 Critical Path Scheduling | p. 225 |
7.4.5 Scheduling Independent Tasks | p. 226 |
7.5 Taking Communication Costs into Account | p. 231 |
7.6 Pb([infinity]) with communications | p. 232 |
7.6.1 NP-Completeness of Pb([infinity]) | p. 234 |
7.6.2 A Guaranteed Heuristic for Pb([infinity]) | p. 236 |
7.7 List Heuristics for Pb(p) with Communications | p. 240 |
7.7.1 Naive Critical Path | p. 240 |
7.7.2 Modified Critical Path | p. 241 |
7.7.3 Hints for Comparison | p. 242 |
7.8 Extension to Heterogeneous Platforms | p. 244 |
7.9 Exercises | p. 247 |
Free Schedule | p. 247 |
List Scheduling Anomalies | p. 248 |
Scheduling a FORK Graph with Communications | p. 248 |
Hu's Algorithm | p. 249 |
7.10 Answers | p. 251 |
8 Advanced Scheduling | p. 255 |
8.1 Divisible Load Scheduling | p. 256 |
8.1.1 Motivation | p. 256 |
8.1.2 Classical Approach | p. 257 |
8.1.3 Divisible Load Approach | p. 261 |
8.1.4 Extension to Star Networks | p. 264 |
8.1.5 Going Further | p. 269 |
8.2 Steady-State Scheduling | p. 271 |
8.2.1 Motivation | p. 271 |
8.2.2 Working Out an Example | p. 272 |
8.2.3 Star-Shaped Platforms | p. 274 |
8.2.4 Tree-Shaped Platforms | p. 278 |
8.2.5 Asymptotic Optimality | p. 279 |
8.2.6 Summary | p. 281 |
8.3 Workflow Scheduling | p. 282 |
8.3.1 Framework | p. 283 |
8.3.2 Period Minimization | p. 286 |
8.3.3 Response Time Minimization | p. 295 |
8.4 Hyperplane Scheduling (or Scheduling at Compile-Time) | p. 296 |
8.4.1 Uniform Loop Nests | p. 296 |
8.4.2 Lamport's Hyperplane Method | p. 301 |
8.5 Exercises | p. 307 |
Divisible Load Scheduling on Heterogeneous Trees | p. 307 |
Steady-State Scheduling of Multiple Applications | p. 308 |
Chains-to-Chains | p. 309 |
Response Time of One-to-One Mappings | p. 309 |
Dependence Analysis and Scheduling Vectors | p. 310 |
Dependence Removal | p. 310 |
8.6 Answers | p. 312 |
Bibliography | p. 323 |
Index | p. 333 |