Parallel algorithms

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010197186	QA76.642 C39 2009	Open Access Book	Book	Searching... Unknown
Searching... PSZ JB	30000003503780	QA76.642 C39 2009	Open Access Book	Book	Searching... Unknown
Searching... SPACE Kuching Library	30000003503749	QA76.642 C39 2009	Open Access Book	Book	Searching... Unknown

Focusing on algorithms for distributed-memory parallel architectures, Parallel Algorithms presents a rigorous yet accessible treatment of theoretical models of parallel computation, parallel algorithm design for homogeneous and heterogeneous platforms, complexity and performance analysis, and essential notions of scheduling. The book extracts fundamental ideas and algorithmic principles from the mass of parallel algorithm expertise and practical implementations developed over the last few decades.

In the first section of the text, the authors cover two classical theoretical models of parallel computation (PRAMs and sorting networks), describe network models for topology and performance, and define several classical communication primitives. The next part deals with parallel algorithms on ring and grid logical topologies as well as the issue of load balancing on heterogeneous computing platforms. The final section presents basic results and approaches for common scheduling problems that arise when developing parallel algorithms. It also discusses advanced scheduling topics, such as divisible load scheduling and steady-state scheduling.

With numerous examples and exercises in each chapter, this text encompasses both the theoretical foundations of parallel algorithms and practical parallel algorithm design.

Author Notes

Henri Casanova, Arnaud Legran, Yves Robert

Preface	p. xiii
I Models	p. 1
1 PRAM Model	p. 3
1.1 Pointer Jumping	p. 4
1.1.1 List Ranking	p. 5
1.1.2 Prefix Computation	p. 7
1.1.3 Euler Tour	p. 8
1.2 Performance Evaluation of PRAM Algorithms	p. 10
1.2.1 Cost, Work, Speedup, and Efficiency	p. 10
1.2.2 A Simple Simulation Result	p. 10
1.2.3 Brent's Theorem	p. 12
1.3 Comparison of PRAM Models	p. 12
1.3.1 Model Separation	p. 13
1.3.2 Simulation Theorem	p. 14
1.4 Sorting Machine	p. 15
1.4.1 Merge	p. 15
1.4.2 Sorting Trees	p. 18
1.4.3 Complexity and Correctness	p. 20
1.5 Relevance of the PRAM Model	p. 24
1.6 Exercises	p. 26
Matrix Multiplication	p. 26
Selection in a List	p. 26
Splitting an Array	p. 26
Looking for Roots in a Forest	p. 26
First Non-Zero Element	p. 27
Mystery Function	p. 27
Connected Components	p. 28
1.7 Answers	p. 30
2 Sorting Networks	p. 37
2.1 Odd-Even Merge Sort	p. 37
2.1.1 Odd-Even Merging Network	p. 38
2.1.2 Sorting Network	p. 41
2.1.3 0-1 Principle	p. 42
2.2 Sorting on a One-Dimensional Network	p. 44
2.2.1 Odd-Even Transposition Sort	p. 44
2.2.2 Odd-Even Sorting on a One-Dimensional Network	p. 47
2.3 Exercises	p. 49
Particular Sequences	p. 49
Bitonic Sorting Network	p. 49
Sorting on a Two-Dimensional Grid	p. 49
2.4 Answers	p. 52
3 Networking	p. 57
3.1 Interconnection Networks	p. 57
3.1.1 Topologies	p. 58
3.1.2 A Few Static Topologies	p. 59
3.1.3 Dynamic Topologies	p. 61
3.2 Communication Model	p. 62
3.2.1 A Simple Performance Model	p. 63
3.2.2 Point-to-Point Communication Protocols	p. 63
3.2.3 More Precise Models	p. 66
3.3 Case Study: The Unidirectional Ring	p. 72
3.3.1 Broadcast	p. 74
3.3.2 Scatter	p. 76
3.3.3 All-to-All	p. 77
3.3.4 Pipelined Broadcast	p. 78
3.4 Case Study: The Hypercube	p. 79
3.4.1 Labeling Vertices	p. 79
3.4.2 Paths and Routing in a Hypercube	p. 80
3.4.3 Embedding Rings and Grids into Hypercubes	p. 82
3.4.4 Collective Communications in a Hypercube	p. 83
3.5 Peer-to-Peer Computing	p. 87
3.5.1 Distributed Hash Tables and Structured Overlay Networks	p. 88
3.5.2 Chord	p. 90
3.5.3 Plaxton Routing Algorithm	p. 91
3.5.4 Multi-Casting in a Distributed Hash Table	p. 91
3.6 Exercises	p. 93
Torus, Hypercubes, and Binary Trees	p. 93
Torus, Hypercubes, and Binary Trees (again!)	p. 93
Cube-Connected Cycles	p. 93
Matrix Transposition	p. 94
Dynamically Switched Networks	p. 94
De Bruijn Network	p. 95
Gossip	p. 96
3.7 Answers	p. 97
II Parallel Algorithms	p. 103
4 Algorithms on a Ring of Processors	p. 105
4.1 Matrix-Vector Multiplication	p. 105
4.2 Matrix-Matrix Multiplication	p. 110
4.3 A First Look at Stencil Applications	p. 112
4.3.1 A Simple Sequential Stencil Algorithm	p. 112
4.3.2 Parallelizations of the Stencil Algorithm	p. 113
4.4 LU Factorization	p. 120
4.4.1 First Version	p. 120
4.4.2 Pipelining on the Ring	p. 126
4.4.3 Look-Ahead Algorithm	p. 128
4.5 A Second Look at Stencil Applications	p. 129
4.5.1 Parallelization on a Unidirectional Ring	p. 130
4.5.2 Parallelization on a Bidirectional Ring	p. 134
4.6 Implementing Logical Topologies	p. 135
4.7 Distributed vs. Centralized Implementations	p. 136
4.8 Summary of Algorithmic Principles	p. 138
4.9 Exercises	p. 139
Solving a Triangular System	p. 139
Givens Rotation	p. 139
Parallel Speedup	p. 140
MPI Matrix-Matrix Multiplication	p. 140
4.10 Answers	p. 142
5 Algorithms on Grids of Processors	p. 147
5.1 Logical Two-Dimensional Grid Topologies	p. 147
5.2 Communication on a Grid of Processors	p. 149
5.3 Matrix Multiplication on a Grid of Processors	p. 150
5.3.1 The Outer-Product Algorithm	p. 151
5.3.2 Grid vs. Ring?	p. 154
5.3.3 Three Matrix Multiplication Algorithms	p. 156
5.3.4 Performance Analysis of the Three Algorithms	p. 161
5.4 Two-Dimensional Block Cyclic Data Distribution	p. 167
5.5 Exercises	p. 169
Matrix Transposition	p. 169
Stencil Application	p. 169
Gauss-Jordan Method	p. 169
Outer-Product Algorithm in MPI	p. 169
5.6 Answers	p. 171
6 Load Balancing on Heterogeneous Platforms	p. 175
6.1 Load Balancing for One-Dimensional Data Distributions	p. 176
6.1.1 Basic Static Task Allocation Algorithm	p. 177
6.1.2 Incremental Algorithm	p. 179
6.1.3 Application to a Stencil Application	p. 181
6.1.4 Application to the LU Factorization	p. 184
6.2 Load Balancing for Two-Dimensional Data Distributions	p. 186
6.2.1 Matrix Multiplication on a Heterogeneous Grid	p. 186
6.2.2 On the Hardness of the Two-Dimensional Data Partitioning Problem	p. 189
6.3 Free Two-Dimensional Partitioning on a Heterogeneous Grid	p. 190
6.3.1 General Problem	p. 190
6.3.2 A Guaranteed Heuristic	p. 194
6.4 Exercises	p. 201
Matrix Product on a Heterogeneous Ring	p. 201
LU Factorization on a Heterogeneous Grid	p. 201
Stencil Application on a Heterogeneous Ring	p. 202
6.5 Answers	p. 203
III Scheduling	p. 207
7 Scheduling	p. 209
7.1 Introduction	p. 209
7.1.1 Where Do Task Graphs Come From?	p. 209
7.1.2 Overview	p. 211
7.2 Scheduling Task Graphs	p. 212
7.3 Solving Pb([infinity])	p. 216
7.4 Solving Pb(p)	p. 218
7.4.1 NP-completeness of Pb(p)	p. 218
7.4.2 List Scheduling Heuristics	p. 220
7.4.3 Implementing a List Schedule	p. 223
7.4.4 Critical Path Scheduling	p. 225
7.4.5 Scheduling Independent Tasks	p. 226
7.5 Taking Communication Costs into Account	p. 231
7.6 Pb([infinity]) with communications	p. 232
7.6.1 NP-Completeness of Pb([infinity])	p. 234
7.6.2 A Guaranteed Heuristic for Pb([infinity])	p. 236
7.7 List Heuristics for Pb(p) with Communications	p. 240
7.7.1 Naive Critical Path	p. 240
7.7.2 Modified Critical Path	p. 241
7.7.3 Hints for Comparison	p. 242
7.8 Extension to Heterogeneous Platforms	p. 244
7.9 Exercises	p. 247
Free Schedule	p. 247
List Scheduling Anomalies	p. 248
Scheduling a FORK Graph with Communications	p. 248
Hu's Algorithm	p. 249
7.10 Answers	p. 251
8 Advanced Scheduling	p. 255
8.1 Divisible Load Scheduling	p. 256
8.1.1 Motivation	p. 256
8.1.2 Classical Approach	p. 257
8.1.3 Divisible Load Approach	p. 261
8.1.4 Extension to Star Networks	p. 264
8.1.5 Going Further	p. 269
8.2 Steady-State Scheduling	p. 271
8.2.1 Motivation	p. 271
8.2.2 Working Out an Example	p. 272
8.2.3 Star-Shaped Platforms	p. 274
8.2.4 Tree-Shaped Platforms	p. 278
8.2.5 Asymptotic Optimality	p. 279
8.2.6 Summary	p. 281
8.3 Workflow Scheduling	p. 282
8.3.1 Framework	p. 283
8.3.2 Period Minimization	p. 286
8.3.3 Response Time Minimization	p. 295
8.4 Hyperplane Scheduling (or Scheduling at Compile-Time)	p. 296
8.4.1 Uniform Loop Nests	p. 296
8.4.2 Lamport's Hyperplane Method	p. 301
8.5 Exercises	p. 307
Divisible Load Scheduling on Heterogeneous Trees	p. 307
Steady-State Scheduling of Multiple Applications	p. 308
Chains-to-Chains	p. 309
Response Time of One-to-One Mappings	p. 309
Dependence Analysis and Scheduling Vectors	p. 310
Dependence Removal	p. 310
8.6 Answers	p. 312
Bibliography	p. 323
Index	p. 333

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents