Cover image for Pipelined processor farms : structured design for embedded parallel systems
Title:
Pipelined processor farms : structured design for embedded parallel systems
Personal Author:
Series:
Wiley series on parallel and distributed computing
Publication Information:
New York, N.Y. : Wiley-Interscience, 2001
ISBN:
9780471388609
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010045609 QA76.6 F55 2001 Open Access Book Book
Searching...

On Order

Summary

Summary

This book outlines a methodology for the use of parallel processing in real time systems. It provides an introduction to parallel processing in general, and to embedded systems in particular. Among the embedded systems are processors in such applications as automobiles, various machinery, IPGAs (field programmable gate arrays), multimedia embedded systems such as those used in the computer game industry, and more.
* Presents design and simulation tools as well as case studies.
* First presentation of this material in book form.


Author Notes

Martin Fleury is a lecturer, and Andrew Downton a professor, in the Department of Electronic Systems Engineering at the University of Essex, England


Table of Contents

Forewordp. v
Prefacep. vii
Acknowledgmentsp. ix
Acronymsp. xix
Part I Introduction and Basic Concepts
1 Introductionp. 1
1.1 Overviewp. 1
1.2 Originsp. 2
1.3 Amdahl's Law and Structured Parallel Designp. 4
1.4 Introduction to PPF Systemsp. 4
1.5 Conclusionsp. 8
Appendixp. 10
A.1 Simple Design Example: The H.261 Decoderp. 10
2 Basic Conceptsp. 17
2.1 Pipelined Processingp. 20
2.2 Pipeline Typesp. 24
2.2.1 Asynchronous PPFp. 25
2.2.2 Synchronous PPFp. 26
2.3 Data Farming and Demand-based Schedulingp. 27
2.4 Data-farm Performance Criteriap. 28
2.5 Conclusionp. 30
Appendixp. 31
A.1 Short case studiesp. 31
3 PPF in Practicep. 37
3.1 Application Overviewp. 38
3.1.1 Implementation issuesp. 39
3.2 Parallelization of the Postcode Recognizerp. 39
3.2.1 Partitioning the postcode recognizerp. 40
3.2.2 Scaling the postcode recognizerp. 41
3.2.3 Performance achievedp. 43
3.3 Parallelization of the address verifierp. 47
3.3.1 Partitioning the address verifierp. 47
3.3.2 Scaling the address verifierp. 49
3.3.3 Address verification farmsp. 50
3.3.4 Overall performance achievedp. 50
3.4 Meeting the Specificationp. 51
3.5 Conclusionp. 53
Appendixp. 53
A.1 Other Parallel Postcode Recognition Systemsp. 53
4 Development of PPF Applicationsp. 57
4.1 Analysis Toolsp. 58
4.2 Tool Characteristicsp. 59
4.3 Development Cyclep. 60
4.4 Conclusionp. 62
Part II Analysis and Partitioning of Sequential Applications
5 Initial Development of an Applicationp. 67
5.1 Confidence Buildingp. 67
5.2 Automatic and Semi-automatic Parallelizationp. 69
5.3 Language Proliferationp. 71
5.4 Size of Applicationsp. 72
5.5 Semi-automatic Partitioningp. 73
5.6 Porting Codep. 75
5.7 Checking a Decompositionp. 77
5.8 Optimizing Compilersp. 77
5.9 Conclusionp. 79
6 Graphical Simulation and Performance Analysis of PPFsp. 81
6.1 Simulating Asynchronous Pipelinesp. 82
6.2 Simulation Implementationp. 82
6.3 Graphical Representationp. 84
6.4 Display Featuresp. 88
6.5 Cross-architectural Comparisonp. 89
6.6 Conclusionp. 93
7 Template-based Implementationp. 95
7.1 Template Design Principlesp. 96
7.2 Implementation Choicesp. 99
7.3 Parallel Logic Implementationp. 100
7.4 Target Machine Implementationp. 101
7.4.1 Common implementation issuesp. 102
7.5 'NOW' Implementation for Logic Debuggingp. 104
7.6 Target Machine Implementations for Performance Tuningp. 109
7.7 Patterns and Templatesp. 112
7.8 Conclusionp. 113
Part III Case Studies
8 Application Examplesp. 117
8.1 Case Study 1: H.261 Encoderp. 118
8.1.1 Purpose of parallelizationp. 119
8.1.2 'Per macroblock's quantization without motion estimationp. 119
8.1.3 'Per picture' quantization without motion estimationp. 123
8.1.4 'Per picture' quantization with motion estimationp. 125
8.1.5 Implementation of the parallel encodersp. 126
8.1.6 H.261 encoders without motion estimationp. 128
8.1.7 H.261 encoder with motion estimationp. 129
8.1.8 Edge data exchangep. 131
8.2 Case Study 2: H263 Encoder/Decoderp. 132
8.2.1 Static analysis of H.263 algorithmp. 134
8.2.2 Results from parallelizing H.263p. 135
8.3 Case Study 3: 'Eigenfaces'--Face Detectionp. 139
8.3.1 Backgroundp. 139
8.3.2 Eigenfaces algorithmp. 140
8.3.3 Parallelization stepsp. 141
8.3.4 Introduction of second and third farmsp. 143
8.4 Case Study 4: Optical Flowp. 145
8.4.1 Optical flowp. 145
8.4.2 Existing sequential implementationp. 147
8.4.3 Gradient-based routinep. 147
8.4.4 Multi-resolution routinep. 150
8.4.5 Phase-based routinep. 154
8.4.6 LK resultsp. 156
8.4.7 Other methodsp. 158
8.4.8 Evaluationp. 160
8.5 Conclusionp. 161
9 Design Studiesp. 163
9.1 Case Study 1: Karhunen-Loeve Transform (KLT)p. 164
9.1.1 Applications of the KLTp. 164
9.1.2 Features of the KLTp. 165
9.1.3 Paralleization of the KLTp. 165
9.1.4 PPF parallelizationp. 168
9.1.5 Implementationp. 171
9.2 Case Study 2: 2D-Wavelet Transformp. 171
9.2.1 Wavelet Transformp. 172
9.2.2 Computational algorithmsp. 173
9.2.3 Parallel implementation of Discrete Wavelet Transform (DWT)p. 173
9.2.4 Parallel implementation of oversampled WTp. 176
9.3 Case Study 3: Vector Quantizationp. 179
9.3.1 Parallelization of VQp. 180
9.3.2 PPF schemes for VQp. 181
9.3.3 VQ implementationp. 183
9.4 Conclusionp. 186
10 Counter Examplesp. 189
10.1 Case Study 1: Large Vocabulary Continuous-Speech Recognitionp. 190
10.1.1 Backgroundp. 190
10.1.2 Static analysis of the LVCR systemp. 191
10.1.3 Parallel designp. 193
10.1.4 Implementation on an SMPp. 195
10.2 Case Study 2: Model-based Codingp. 196
10.2.1 Parallelization of the model-based coderp. 196
10.2.2 Analysis of resultsp. 198
10.3 Case Study 3: Microphone Beam Arrayp. 202
10.3.1 Griffiths-Jim beam-formerp. 202
10.3.2 Sequential implementationp. 203
10.3.3 Parallel implementation of the G-J Algorithmp. 204
10.4 Conclusionp. 206
Part IV Underlying Theory and Analysis
11 Performance of PPFsp. 211
11.1 Naming Conventionsp. 212
11.2 Performance Metricsp. 212
11.2.1 Order statisticsp. 213
11.2.2 Asymptotic distributionp. 216
11.2.3 Characteristic maximump. 217
11.2.4 Sample estimatep. 219
11.3 Gathering Performance Datap. 220
11.4 Performance Prediction Equationsp. 221
11.5 Resultsp. 223
11.5.1 Prediction resultsp. 224
11.6 Simulation Resultsp. 225
11.7 Asynchronous Pipeline Estimatep. 227
11.8 Ordering Constraintsp. 230
11.9 Task Schedulingp. 235
11.9.1 Uniform task sizep. 236
11.9.2 Decreasing task sizep. 236
11.9.3 Heuristic scheduling schemesp. 237
11.9.4 Validity of Factoringp. 238
11.10 Scheduling Resultsp. 238
11.10.1 Timingsp. 238
11.10.2 Simulation resultsp. 240
11.11 Conclusionp. 241
Appendixp. 242
A.1 Outline derivation of Kruskal-Weiss prediction equationp. 242
A.2 Factoring regime derivationp. 243
12 Instrumentation of Templatesp. 247
12.1 Global Timep. 248
12.2 Processor Modelp. 249
12.3 Local Clock Requirementsp. 249
12.4 Steady-state Behaviorp. 250
12.5 Establishing a Refresh Intervalp. 253
12.6 Local Clock Adjustmentp. 256
12.7 Implementation on the Paramidp. 257
12.8 Conclusionp. 259
Part V Future Trends
13 Future Trendsp. 263
13.1 Designing for Differing Embedded Hardwarep. 265
13.2 Adapting to Mobile Networked Computationp. 265
13.3 Conclusionp. 267
Referencesp. 269
Indexp. 299