Cover image for Parallel algorithms
Title:
Parallel algorithms
Personal Author:
Series:
Chapman and Hall/CRC numerical analysis and scientific computing
Publication Information:
Boca Raton, FL : CRC Press, 2008
Physical Description:
xv, 337 p. : ill. ; 25 cm.
ISBN:
9781584889458
General Note:
A Chapman & Hall book

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010197186 QA76.642 C39 2009 Open Access Book Book
Searching...
Searching...
30000003503780 QA76.642 C39 2009 Open Access Book Book
Searching...
Searching...
30000003503749 QA76.642 C39 2009 Open Access Book Book
Searching...

On Order

Summary

Summary

Focusing on algorithms for distributed-memory parallel architectures, Parallel Algorithms presents a rigorous yet accessible treatment of theoretical models of parallel computation, parallel algorithm design for homogeneous and heterogeneous platforms, complexity and performance analysis, and essential notions of scheduling. The book extracts fundamental ideas and algorithmic principles from the mass of parallel algorithm expertise and practical implementations developed over the last few decades.

In the first section of the text, the authors cover two classical theoretical models of parallel computation (PRAMs and sorting networks), describe network models for topology and performance, and define several classical communication primitives. The next part deals with parallel algorithms on ring and grid logical topologies as well as the issue of load balancing on heterogeneous computing platforms. The final section presents basic results and approaches for common scheduling problems that arise when developing parallel algorithms. It also discusses advanced scheduling topics, such as divisible load scheduling and steady-state scheduling.

With numerous examples and exercises in each chapter, this text encompasses both the theoretical foundations of parallel algorithms and practical parallel algorithm design.


Author Notes

Henri Casanova, Arnaud Legran, Yves Robert


Table of Contents

Prefacep. xiii
I Modelsp. 1
1 PRAM Modelp. 3
1.1 Pointer Jumpingp. 4
1.1.1 List Rankingp. 5
1.1.2 Prefix Computationp. 7
1.1.3 Euler Tourp. 8
1.2 Performance Evaluation of PRAM Algorithmsp. 10
1.2.1 Cost, Work, Speedup, and Efficiencyp. 10
1.2.2 A Simple Simulation Resultp. 10
1.2.3 Brent's Theoremp. 12
1.3 Comparison of PRAM Modelsp. 12
1.3.1 Model Separationp. 13
1.3.2 Simulation Theoremp. 14
1.4 Sorting Machinep. 15
1.4.1 Mergep. 15
1.4.2 Sorting Treesp. 18
1.4.3 Complexity and Correctnessp. 20
1.5 Relevance of the PRAM Modelp. 24
1.6 Exercisesp. 26
Matrix Multiplicationp. 26
Selection in a Listp. 26
Splitting an Arrayp. 26
Looking for Roots in a Forestp. 26
First Non-Zero Elementp. 27
Mystery Functionp. 27
Connected Componentsp. 28
1.7 Answersp. 30
2 Sorting Networksp. 37
2.1 Odd-Even Merge Sortp. 37
2.1.1 Odd-Even Merging Networkp. 38
2.1.2 Sorting Networkp. 41
2.1.3 0-1 Principlep. 42
2.2 Sorting on a One-Dimensional Networkp. 44
2.2.1 Odd-Even Transposition Sortp. 44
2.2.2 Odd-Even Sorting on a One-Dimensional Networkp. 47
2.3 Exercisesp. 49
Particular Sequencesp. 49
Bitonic Sorting Networkp. 49
Sorting on a Two-Dimensional Gridp. 49
2.4 Answersp. 52
3 Networkingp. 57
3.1 Interconnection Networksp. 57
3.1.1 Topologiesp. 58
3.1.2 A Few Static Topologiesp. 59
3.1.3 Dynamic Topologiesp. 61
3.2 Communication Modelp. 62
3.2.1 A Simple Performance Modelp. 63
3.2.2 Point-to-Point Communication Protocolsp. 63
3.2.3 More Precise Modelsp. 66
3.3 Case Study: The Unidirectional Ringp. 72
3.3.1 Broadcastp. 74
3.3.2 Scatterp. 76
3.3.3 All-to-Allp. 77
3.3.4 Pipelined Broadcastp. 78
3.4 Case Study: The Hypercubep. 79
3.4.1 Labeling Verticesp. 79
3.4.2 Paths and Routing in a Hypercubep. 80
3.4.3 Embedding Rings and Grids into Hypercubesp. 82
3.4.4 Collective Communications in a Hypercubep. 83
3.5 Peer-to-Peer Computingp. 87
3.5.1 Distributed Hash Tables and Structured Overlay Networksp. 88
3.5.2 Chordp. 90
3.5.3 Plaxton Routing Algorithmp. 91
3.5.4 Multi-Casting in a Distributed Hash Tablep. 91
3.6 Exercisesp. 93
Torus, Hypercubes, and Binary Treesp. 93
Torus, Hypercubes, and Binary Trees (again!)p. 93
Cube-Connected Cyclesp. 93
Matrix Transpositionp. 94
Dynamically Switched Networksp. 94
De Bruijn Networkp. 95
Gossipp. 96
3.7 Answersp. 97
II Parallel Algorithmsp. 103
4 Algorithms on a Ring of Processorsp. 105
4.1 Matrix-Vector Multiplicationp. 105
4.2 Matrix-Matrix Multiplicationp. 110
4.3 A First Look at Stencil Applicationsp. 112
4.3.1 A Simple Sequential Stencil Algorithmp. 112
4.3.2 Parallelizations of the Stencil Algorithmp. 113
4.4 LU Factorizationp. 120
4.4.1 First Versionp. 120
4.4.2 Pipelining on the Ringp. 126
4.4.3 Look-Ahead Algorithmp. 128
4.5 A Second Look at Stencil Applicationsp. 129
4.5.1 Parallelization on a Unidirectional Ringp. 130
4.5.2 Parallelization on a Bidirectional Ringp. 134
4.6 Implementing Logical Topologiesp. 135
4.7 Distributed vs. Centralized Implementationsp. 136
4.8 Summary of Algorithmic Principlesp. 138
4.9 Exercisesp. 139
Solving a Triangular Systemp. 139
Givens Rotationp. 139
Parallel Speedupp. 140
MPI Matrix-Matrix Multiplicationp. 140
4.10 Answersp. 142
5 Algorithms on Grids of Processorsp. 147
5.1 Logical Two-Dimensional Grid Topologiesp. 147
5.2 Communication on a Grid of Processorsp. 149
5.3 Matrix Multiplication on a Grid of Processorsp. 150
5.3.1 The Outer-Product Algorithmp. 151
5.3.2 Grid vs. Ring?p. 154
5.3.3 Three Matrix Multiplication Algorithmsp. 156
5.3.4 Performance Analysis of the Three Algorithmsp. 161
5.4 Two-Dimensional Block Cyclic Data Distributionp. 167
5.5 Exercisesp. 169
Matrix Transpositionp. 169
Stencil Applicationp. 169
Gauss-Jordan Methodp. 169
Outer-Product Algorithm in MPIp. 169
5.6 Answersp. 171
6 Load Balancing on Heterogeneous Platformsp. 175
6.1 Load Balancing for One-Dimensional Data Distributionsp. 176
6.1.1 Basic Static Task Allocation Algorithmp. 177
6.1.2 Incremental Algorithmp. 179
6.1.3 Application to a Stencil Applicationp. 181
6.1.4 Application to the LU Factorizationp. 184
6.2 Load Balancing for Two-Dimensional Data Distributionsp. 186
6.2.1 Matrix Multiplication on a Heterogeneous Gridp. 186
6.2.2 On the Hardness of the Two-Dimensional Data Partitioning Problemp. 189
6.3 Free Two-Dimensional Partitioning on a Heterogeneous Gridp. 190
6.3.1 General Problemp. 190
6.3.2 A Guaranteed Heuristicp. 194
6.4 Exercisesp. 201
Matrix Product on a Heterogeneous Ringp. 201
LU Factorization on a Heterogeneous Gridp. 201
Stencil Application on a Heterogeneous Ringp. 202
6.5 Answersp. 203
III Schedulingp. 207
7 Schedulingp. 209
7.1 Introductionp. 209
7.1.1 Where Do Task Graphs Come From?p. 209
7.1.2 Overviewp. 211
7.2 Scheduling Task Graphsp. 212
7.3 Solving Pb([infinity])p. 216
7.4 Solving Pb(p)p. 218
7.4.1 NP-completeness of Pb(p)p. 218
7.4.2 List Scheduling Heuristicsp. 220
7.4.3 Implementing a List Schedulep. 223
7.4.4 Critical Path Schedulingp. 225
7.4.5 Scheduling Independent Tasksp. 226
7.5 Taking Communication Costs into Accountp. 231
7.6 Pb([infinity]) with communicationsp. 232
7.6.1 NP-Completeness of Pb([infinity])p. 234
7.6.2 A Guaranteed Heuristic for Pb([infinity])p. 236
7.7 List Heuristics for Pb(p) with Communicationsp. 240
7.7.1 Naive Critical Pathp. 240
7.7.2 Modified Critical Pathp. 241
7.7.3 Hints for Comparisonp. 242
7.8 Extension to Heterogeneous Platformsp. 244
7.9 Exercisesp. 247
Free Schedulep. 247
List Scheduling Anomaliesp. 248
Scheduling a FORK Graph with Communicationsp. 248
Hu's Algorithmp. 249
7.10 Answersp. 251
8 Advanced Schedulingp. 255
8.1 Divisible Load Schedulingp. 256
8.1.1 Motivationp. 256
8.1.2 Classical Approachp. 257
8.1.3 Divisible Load Approachp. 261
8.1.4 Extension to Star Networksp. 264
8.1.5 Going Furtherp. 269
8.2 Steady-State Schedulingp. 271
8.2.1 Motivationp. 271
8.2.2 Working Out an Examplep. 272
8.2.3 Star-Shaped Platformsp. 274
8.2.4 Tree-Shaped Platformsp. 278
8.2.5 Asymptotic Optimalityp. 279
8.2.6 Summaryp. 281
8.3 Workflow Schedulingp. 282
8.3.1 Frameworkp. 283
8.3.2 Period Minimizationp. 286
8.3.3 Response Time Minimizationp. 295
8.4 Hyperplane Scheduling (or Scheduling at Compile-Time)p. 296
8.4.1 Uniform Loop Nestsp. 296
8.4.2 Lamport's Hyperplane Methodp. 301
8.5 Exercisesp. 307
Divisible Load Scheduling on Heterogeneous Treesp. 307
Steady-State Scheduling of Multiple Applicationsp. 308
Chains-to-Chainsp. 309
Response Time of One-to-One Mappingsp. 309
Dependence Analysis and Scheduling Vectorsp. 310
Dependence Removalp. 310
8.6 Answersp. 312
Bibliographyp. 323
Indexp. 333