Voice compression and communications : principles and applications for fixed and wireless channels

Up-to-date, expert coverage of topics in wireless voice communications
Voice communication is the most important facet of mobile radio service. Even when the predicted surge of wireless data and Internet services becomes a reality, voice will remain the most natural means of human communication.
Voice Compression and Communications details issues in wireless voice communications and treats compression, channel coding, and wireless transmission as a joint subject. Part I covers background material, whereas Part II provides detailed information on both proprietary and standardized analysis-by-synthesis codecs, including the speech codecs of virtually all existing wireline-based and wireless systems. Parts III and IV discuss mainly research-based wideband, audio, as well as very low-rate schemes likely to find their way into future standards.
Voice Compression and Communications describes fundamental concepts in a non-mathematical way early in the book for those with only a background knowledge of signal processing and communications. More advanced readers will find detailed discussions of theoretical principles, future concepts, and solutions to various specific wireless voice communications problems.

Author Notes

LAJOS HANZO has coauthored five books on mobile radio communications and published more than 300 research papers on a variety of topics in wireless multimedia communications. He holds a chair in telecommunications at the Department of Electronics and Computer Science, University of Southampton, UK, and he is an IEEE Distinguished Lecturer.
F. CLARE A. SOMERVILLE is with the Global Wireless Systems Research Department, Bell Laboratories, Swindon, UK. His current research involves real-time techniques for transmission of voice over GPRS and the resultant speech quality attained.
JASON P. WOODARD is with UbiNetics Ltd., where he is responsible for the development and implementation of various algorithms for third-generation mobile communications products.

Reviews 1

Choice Review

Hanzo (Univ. of Southampton, UK), Sommerville (Bell Laboratories, UK), and Woodard offer a treatise on voice compression theory and practice that comprehensively treats this field's evolution and current state of the art. The book features four major sections; the first two, "Speech Signals and Waveform Coding" and "Analysis by Synthesis Coding," discuss fundamental concepts, codec implementations (codecs are algorithms used to encode or decode, compress or decompress, various types of data to save disk space, such as sound or video files), and many related design topics. The final two sections, "Wideband Coding and Transmission" and "Very Low-Rate Coding and Transmission," focus on current research initiatives. The book is intended as a resource and design guide for practitioners of voice communications systems, e.g., wireless telephony. Other than for the initial chapters, readers will require a relatively advanced grasp of mathematics and related engineering skills. More than 300 references to other papers, books, and standards; topic and author indexes. The authors are experienced and knowledgeable and have produced a significant new addition to this field's literature. Researchers; faculty; professionals. E. M. Aupperle University of Michigan

Preface	p. xxiii
Acknowledgments	p. xxix
Part I Speech Signals and Waveform Coding	p. 1
Chapter 1 Speech Signals and Introduction to Speech Coding	p. 3
1.1 Motivation of Speech Compression	p. 3
1.2 Basic Characterization of Speech Signals	p. 4
1.3 Classification of Speech Codecs	p. 7
1.4 Waveform Coding	p. 11
1.5 Chapter Summary	p. 26
Chapter 2 Predictive Coding	p. 27
2.1 Forward Predictive Coding	p. 27
2.2 DPCM Codec Schematic	p. 28
2.3 Predictor Design	p. 29
2.4 Adaptive One-Word-Memory Quantization	p. 36
2.5 DPCM Performance	p. 37
2.6 Backward-Adaptive Prediction	p. 39
2.7 The 32 kbps G.721 ADPCM Codec	p. 43
2.8 Subjective and Objective Speech Quality	p. 49
2.9 Variable-Rate G.726 and Embedded G.727 ADPCM	p. 50
2.10 Rate-Distortion in Predictive Coding	p. 58
2.11 Chapter Summary	p. 62
Part II Analysis by Synthesis Coding	p. 63
Chapter 3 Analysis-by-Synthesis Principles	p. 65
3.1 Motivation	p. 65
3.2 Analysis-by-Synthesis Codec Structure	p. 66
3.3 The Short-Term Synthesis Filter	p. 67
3.4 Long-Term Prediction	p. 70
3.5 Excitation Models	p. 78
3.6 Adaptive Short-Term and Long-Term Post-Filtering	p. 81
3.7 Lattice-Based Linear Prediction	p. 83
3.8 Chapter Summary	p. 89
Chapter 4 Speech Spectral Quantization	p. 90
4.1 Log-Area Ratios	p. 90
4.2 Line Spectral Frequencies	p. 95
4.3 Vector Quantization of Spectral Parameters	p. 105
4.4 Spectral Quantizers for Wideband Speech Coding	p. 113
4.5 Chapter Summary	p. 126
Chapter 5 Regular Pulse Excited Coding	p. 127
5.1 Theoretical Background	p. 127
5.2 The 13 kbps RPE-LTP GSM Speech Encoder	p. 133
5.3 The 13 kbps RPE-LTP GSM Speech Decoder	p. 137
5.4 Bit Sensitivity of the 13 kbps GSM RPE-LTP Codec	p. 140
5.5 Application Example: A Toolbox-Based Speech Transceiver	p. 142
5.6 Chapter Summary	p. 144
Chapter 6 Forward-Adaptive Code Excited Linear Prediction	p. 145
6.1 Background	p. 145
6.2 The Original CELP Approach	p. 146
6.3 Fixed Codebook Search	p. 149
6.4 CELP Excitation Models	p. 151
6.5 Optimization of the CELP Codec Parameters	p. 160
6.6 The Error-Sensitivity of CELP Codecs	p. 175
6.7 Application Example: A Dual-Mode 3.1 kBd Speech Transceiver	p. 187
6.8 Multi-Slot PRMA Transceiver	p. 200
6.9 Chapter Summary	p. 206
Chapter 7 Standard Forward-Adaptive CELP Codecs	p. 207
7.1 Background	p. 207
7.2 The U.S. DoD FS-1016 4.8 kbits/s CELP Codec	p. 207
7.3 The IS-54 DAMPS kbps Pan American Speech Codec	p. 213
7.4 The 6.7 kbps Japanese Digital Cellular System's Speech Codec	p. 216
7.5 The Qualcomm Variable-Rate CELP Codec	p. 218
7.6 Japanese Half-Rate Speech Codec	p. 225
7.7 The Half-Rate GSM Codec	p. 233
7.8 The 8 kbits/s G.729 Codec	p. 237
7.9 The Reduced Complexity G.729 Annex A Codec	p. 256
7.10 The 12.2 kbps Enhanced Full-Rate GSM Speech Codec	p. 259
7.11 The Enhanced Full-Rate 7.4 kbps IS-136 Speech Codec	p. 264
7.12 The ITU G.723.1 Dual-Rate Codec	p. 268
7.13 Chapter Summary	p. 277
Chapter 8 Backward-Adaptive Code Excited Linear Prediction	p. 279
8.1 Introduction	p. 279
8.2 Motivation and Background	p. 279
8.3 Backward-Adaptive G.728 Codec Schematic	p. 282
8.4 Backward-Adaptive G.728 Coding Algorithm	p. 284
8.5 Reduced-Rate G.728-Like Codec: Variable-Length Excitation Vector	p. 298
8.6 The Effects of Long-Term Prediction	p. 300
8.7 Closed-Loop Codebook Training	p. 305
8.8 Reduced-Rate G.728-Like Codec II: Constant-Length Excitation Vector	p. 309
8.9 Programmable-Rate 8-4 kbps Low-Delay CELP Codecs	p. 310
8.10 Backward-Adaptive Error Sensitivity Issues	p. 327
8.11 A Low-Delay Multimode Speech Transceiver	p. 333
8.12 Chapter Summary	p. 338
Part III Wideband Coding and Transmission	p. 339
Chapter 9 Wideband Speech Coding	p. 341
9.1 Sub-band-ADPCM Wideband Coding at 64 kbps	p. 341
9.2 Wideband Transform Coding at 32 kbps	p. 357
9.3 Sub-Band-Split Wideband CELP Codecs	p. 360
9.4 Fullband Wideband ACELP Coding	p. 363
9.5 A Turbo-Coded Burst-by-Burst Adaptive Wideband Speech Transceiver	p. 368
9.6 Chapter Summary	p. 384
Part IV Very Low-Rate Coding and Transmission	p. 385
Chapter 10 Overview of Low-Rate Speech Coding	p. 387
10.1 Low-Bitrate Speech Coding	p. 387
10.2 Linear Predictive Coding Model	p. 400
10.3 Speech Quality Measurements	p. 403
10.4 Speech Database	p. 406
10.5 Chapter Summary	p. 409
Chapter 11 Linear Predictive Vocoder	p. 411
11.1 Overview of a Linear Predictive Vocoder	p. 411
11.2 Line Spectrum Frequencies Quantization	p. 412
11.3 Pitch Detection	p. 417
11.4 Unvoiced Frames	p. 428
11.5 Voiced Frames	p. 429
11.6 Adaptive Post-Filter	p. 430
11.7 Pulse Dispersion Filter	p. 432
11.8 Results for Linear Predictive Vocoder	p. 437
11.9 Chapter Summary	p. 440
Chapter 12 Wavelets and Pitch Detection	p. 441
12.1 Conceptual Introduction to Wavelets	p. 441
12.2 Introduction to Wavelet Mathematics	p. 444
12.3 Pre-Processing the Wavelet Transform Signal	p. 449
12.4 Voiced-Unvoiced Decision	p. 452
12.5 Wavelet-Based Pitch Detector	p. 453
12.6 Summary and Conclusions	p. 460
Chapter 13 Zinc Function Excitation	p. 461
13.1 Introduction	p. 461
13.2 Overview of Prototype Waveform Interpolation Zinc Function Excitation	p. 462
13.3 Zinc Function Modeling	p. 466
13.4 Pitch Detection	p. 470
13.5 Voiced Speech	p. 473
13.6 Excitation Interpolation Between Prototype Segments	p. 477
13.7 Unvoiced Speech	p. 483
13.8 Adaptive Post-Filter	p. 483
13.9 Results for Single Zinc Function Excitation	p. 483
13.10 Error Sensitivity of the 1.9 kbps PWI-ZFE Coder	p. 486
13.11 Multiple Zinc Function Excitation	p. 490
13.12 A Sixth-Rate, 3.8 kbps GSM-Like Speech Transceiver	p. 496
13.13 Chapter Summary	p. 500
Chapter 14 Mixed-Multiband Excitation	p. 501
14.1 Introduction	p. 501
14.2 Overview of Mixed-Multiband Excitation	p. 502
14.3 Finite Impulse Response Filter	p. 504
14.4 Mixed-Multiband Excitation Encoder	p. 507
14.5 Mixed-Multiband Excitation Decoder	p. 510
14.6 Performance of the Mixed-Multiband Excitation Coder	p. 513
14.7 A Higher Rate 3.85 kbps Mixed-Multiband Excitation Scheme	p. 520
14.8 A 2.35 kbit/s Joint-Detection-Based CDMA Speech Transceiver	p. 523
14.9 Chapter Summary	p. 530
Chapter 15 Sinusoidal Transform Coding Below 4 kbps	p. 531
15.1 Introduction	p. 531
15.2 Sinusoidal Analysis of Speech Signals	p. 532
15.3 Sinusoidal Synthesis of Speech Signals	p. 534
15.4 Low-Bitrate Sinusoidal Coders	p. 536
15.5 Incorporating Prototype Waveform Interpolation	p. 539
15.6 Encoding the Sinusoidal Frequency Component	p. 541
15.7 Determining the Excitation Components	p. 543
15.8 Quantizing the Excitation Parameters	p. 548
15.9 Sinusoidal Transform Decoder	p. 556
15.10 Speech Coder Performance	p. 558
15.11 Chapter Summary	p. 563
Chapter 16 Conclusions on Low-Rate Coding	p. 565
16.1 Overview	p. 565
16.2 Listening Tests	p. 565
16.3 Summary of Very Low-Rate Coding	p. 567
16.4 Further Research	p. 568
Chapter 17 Comparison of Speech Codecs and Transceivers	p. 569
17.1 Background to Speech Quality Evaluation	p. 569
17.2 Objective Speech Quality Measures	p. 570
17.3 Subjective Measures	p. 577
17.4 Comparison of Subjective and Objective Measures	p. 578
17.5 Subjective Speech Quality of Various Codecs	p. 580
17.6 Error Sensitivity Comparison of Various Codecs	p. 582
17.7 Objective Speech Performance of Various Transceivers	p. 583
Appendix A Constructing the Quadratic Spline Wavelets	p. 589
Appendix B Zinc Function Excitation	p. 593
Appendix C Probability Density Function for Amplitudes	p. 597
Bibliography	p. 601
Index	p. 623
Author Index	p. 631

Available:*

On Order

Summary

Summary

Author Notes

Reviews 1

Choice Review

Table of Contents