Skip to:Content
|
Bottom
Cover image for Pattern recognition in speech and language processing
Title:
Pattern recognition in speech and language processing
Publication Information:
London : CRC Press, 2003
ISBN:
9780849312328

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010029169 TK7882.S65 P39 2003 Open Access Book Book
Searching...

On Order

Summary

Summary

Over the last 20 years, approaches to designing speech and language processing algorithms have moved from methods based on linguistics and speech science to data-driven pattern recognition techniques. These techniques have been the focus of intense, fast-moving research and have contributed to significant advances in this field.

Pattern Recognition in Speech and Language Processing offers a systematic, up-to-date presentation of these recent developments. It begins with the fundamentals and recent theoretical advances in pattern recognition, with emphasis on classifier design criteria and optimization procedures. The focus then shifts to the application of these techniques to speech processing, with chapters exploring advances in applying pattern recognition to real speech and audio processing systems. The final section of the book examines topics related to pattern recognition in language processing: topics that represent promising new trends with direct impact on information processing systems for the Web, broadcast news, and other content-rich information resources.

Each self-contained chapter includes figures, tables, diagrams, and references. The collective effort of experts at the forefront of the field, Pattern Recognition in Speech and Language Processing offers in-depth, insightful discussions on new developments and contains a wealth of information integral to the further development of human-machine communications.


Table of Contents

Wu ChouVaibhava Goel and William ByrneQiang HuoShigeru KatagiriJean-Luc Gauvain and Lori LamelSadaoki FuruiQi Li and Biing-Hwang JuangRichard M. Schwartz and John MakhoulJerome R. BellegardaA. L. Gorin and A. Abella and T. Alonso and G. Riccardi and J. H. WrightHerman Ney and F. J. OchJames Allan
1 Minimum Classification Error (MCE) Approach in Pattern Recognitionp. 1
1.1 Introductionp. 1
1.2 Optimal Classifier from Bayes Decision Theoryp. 3
1.3 Discriminant Function Approach to Classifier Designp. 6
1.4 Speech Recognition and Hidden Markov Modelingp. 7
1.4.1 Hidden Markov Modeling of Speechp. 8
1.5 MCE Classifier Design Using Discriminant Functionsp. 10
1.5.1 MCE Classifier Design Strategyp. 10
1.5.2 Optimization Methodsp. 12
1.5.3 Other Optimization Methodsp. 14
1.5.4 HMM as a Discriminant Functionp. 15
1.5.5 Relation between MCE and MMIp. 18
1.5.6 Discussions and Commentsp. 22
1.6 Embedded String Model Based MCE Trainingp. 24
1.6.1 String Model Based MCE Approachp. 25
1.6.2 Combined String Model Based MCE Approachp. 29
1.6.3 Discriminative Feature Extractionp. 32
1.7 Verification and Identificationp. 33
1.7.1 Speaker Verification and Identificationp. 35
1.7.2 Utterance Verificationp. 37
1.8 Summaryp. 40
2 Minimum Bayes-Risk Methods in Automatic Speech Recognitionp. 51
2.1 Minimum Bayes-Risk Classification Frameworkp. 52
2.1.1 Likelihood Ratio Based Hypothesis Testingp. 53
2.1.2 Maximum A-Posteriori Probability Classificationp. 54
2.1.3 Previous Studies of Application Sensitive ASRp. 54
2.2 Practical MBR Procedures for ASRp. 55
2.2.1 Summation over Hidden State Sequencesp. 56
2.2.2 MBR Recognition with N-best Listsp. 57
2.2.3 MBR Recognition with Latticesp. 57
2.3 Segmental MBR Proceduresp. 64
2.3.1 Segmental Votingp. 66
2.3.2 ROVERp. 67
2.3.3 e-ROVERp. 68
2.4 Experimental Resultsp. 70
2.4.1 Parameter Tuning within the MBR Classification Rulep. 70
2.4.2 Utterance Level MBR Word and Keyword Recognitionp. 72
2.4.3 ROVER and e-ROVER for Multilingual ASRp. 74
2.5 Summaryp. 76
2.6 Acknowledgementsp. 77
3 A Decision Theoretic Formulation for Robust Automatic Speech Recognitionp. 81
3.1 Introductionp. 81
3.2 Optimal Bayes' Decision Rule for ASRp. 83
3.3 Adaptive Decision Rules Constructed from Training Samplesp. 85
3.3.1 Plug-in Bayes' Decision Rules with Maximum-likelihood Density Estimatep. 86
3.3.2 Maximum-Discriminant Decision Rules Minimizing the Empirical Classification Errorp. 89
3.3.3 Discussionp. 90
3.4 Violations of Modeling Assumptions in ASRp. 91
3.4.1 Types of Distortionsp. 91
3.4.2 Towards Adaptive and Robust ASRp. 92
3.5 Improving Adaptive Decision Rules via Decision Parameter Adaptationp. 93
3.5.1 Decision Parameter Adaptation for Stationary Operating Conditionsp. 93
3.5.2 Decision Parameter Adaptation for Slowly Changing Operating Conditionsp. 95
3.5.3 Decision Parameter Adaptation for Switching Operating Conditionsp. 96
3.5.4 Discussionp. 97
3.6 Robust Decision Rulesp. 97
3.6.1 Decision Rule Robustnessp. 97
3.6.2 Minimax Classification Rulep. 98
3.6.3 Bayesian Predictive Classification Rulep. 100
3.6.4 Discussionp. 103
3.7 Summaryp. 104
4 Speech Pattern Recognition using Neural Networksp. 115
4.1 Introductionp. 115
4.2 Bayes Decision Theoryp. 117
4.2.1 Preparationsp. 117
4.2.2 Decision Rulep. 117
4.2.3 Minimum Error-rate Classificationp. 118
4.2.4 Probability Function Estimationp. 118
4.2.5 Discriminative Trainingp. 119
4.3 Speech Recognizers Based on Neural Networksp. 124
4.3.1 Preparationsp. 124
4.3.2 Classification Error Minimizationp. 125
4.3.3 Squared Error Minimizationp. 130
4.3.4 Cross Entropy Minimizationp. 133
4.4 Fusion of Multiple Classification Decisionsp. 135
4.4.1 Principlesp. 135
4.4.2 Examples of Embodimentp. 138
4.5 Concluding Remarksp. 143
4.6 Appendix: Maximizing Mutual Informationp. 146
5 Large Vocabulary Speech Recognition Based on Statistical Methodsp. 149
5.1 Introductionp. 149
5.2 Overviewp. 150
5.3 Language Modelingp. 151
5.3.1 Text Preparationp. 153
5.3.2 Vocabulary Selectionp. 154
5.3.3 N-gram Estimationp. 154
5.3.4 LM Adaptationp. 156
5.4 Pronunciation Modelingp. 156
5.5 Acoustic Modelingp. 159
5.5.1 Acoustic Front-endp. 159
5.5.2 Modeling Allophonesp. 161
5.5.3 HMM Parameter Estimationp. 163
5.5.4 HMM Adaptationp. 165
5.6 Decodingp. 167
5.6.1 Speech/Non-speech Detectionp. 168
5.6.2 Decoding Strategiesp. 169
5.6.3 Efficiencyp. 170
5.6.4 Confidence Measuresp. 172
5.7 Indicative Performance Levelsp. 172
5.7.1 Dictationp. 173
5.7.2 Speech Recognition for Dialog Systemsp. 174
5.7.3 Transcription for Audio Indexationp. 175
5.8 Portability and Language Dependenciesp. 177
6 Toward Spontaneous Speech Recognition and Understandingp. 191
6.1 Introductionp. 191
6.2 Four Categories of Speech Recognition Tasksp. 193
6.3 Spontaneous Speech Recognition and Understanding - Reviewp. 195
6.3.1 Category I (human-to-human dialogue)p. 195
6.3.2 Category II (human-to-human monologue)p. 196
6.3.3 Category III (human-to-machine dialogue)p. 198
6.4 Japanese National Project on Spontaneous Speech Corpus and Processing Technologyp. 200
6.4.1 Project Overviewp. 200
6.4.2 Corpusp. 201
6.5 Automatic Transcription of Spontaneous Presentationp. 202
6.5.1 Recognition Taskp. 202
6.5.2 Language and Acoustic Modelingp. 202
6.5.3 Recognition Resultsp. 203
6.5.4 Analysis on Individual Differencesp. 205
6.5.5 Discussionp. 210
6.6 Automatic Speech Summarization and Evaluationp. 212
6.6.1 Summarization of Each Sentence Utterancep. 212
6.6.2 Summarization of Multiple Utterancesp. 215
6.6.3 Evaluationp. 215
6.6.4 Discussionp. 218
6.7 Spontaneous Speech Recognition and Understanding Research Issuesp. 219
6.7.1 Language Models and Corporap. 219
6.7.2 Message-driven Speech Recognition and Understandingp. 220
6.7.3 Statistical Approaches and Speech Sciencep. 222
6.7.4 Research on the Human Brainp. 222
6.7.5 Dynamic Spectral Featuresp. 223
6.8 Conclusionp. 224
7 Speaker Authenticationp. 229
7.1 Introductionp. 229
7.1.1 Speaker Recognition and Verificationp. 230
7.1.2 Verbal Information Verificationp. 232
7.2 Pattern Recognition in Speaker Authenticationp. 234
7.2.1 Bayesian Decision Theoryp. 234
7.2.2 Stochastic Models for Stationary Processp. 236
7.2.3 Stochastic Models for Non-Stationary Processp. 237
7.2.4 Speech Segmentationp. 239
7.2.5 Statistical Verificationp. 239
7.3 Speaker Verification Systemp. 240
7.4 Verbal Information Verificationp. 243
7.4.1 Utterance Segmentationp. 244
7.4.2 Subword Hypothesis Testingp. 244
7.4.3 Confidence Measure Calculationp. 245
7.4.4 Sequential Utterance Verificationp. 247
7.4.5 VIV Experimental Resultsp. 249
7.5 Speaker Authentication by Combining SV and VIVp. 251
7.6 Summaryp. 254
8 HMMs for Language Processing Problemsp. 261
8.1 Introductionp. 261
8.2 Use of Probabilitiesp. 262
8.2.1 Hidden Markov Modelsp. 263
8.3 Name Spottingp. 264
8.4 Topic Classificationp. 266
8.4.1 The Modelp. 267
8.4.2 Estimating HMM Parametersp. 268
8.4.3 Classificationp. 269
8.4.4 Experimentsp. 269
8.5 Information Retrievalp. 270
8.5.1 A Bayesian Model for IRp. 270
8.5.2 Training the IR HMMp. 271
8.5.3 Performancep. 271
8.6 Event Trackingp. 272
8.7 Unsupervised Topic Detectionp. 273
8.8 Summaryp. 275
9 Statistical Language Models With Embedded Latent Semantic Knowledgep. 277
9.1 Introductionp. 277
9.1.1 Scope Localityp. 277
9.1.2 Syntactically-Driven Span Extensionp. 278
9.1.3 Semantically-Driven Span Extensionp. 279
9.1.4 Organizationp. 279
9.2 Latent Semantic Analysisp. 280
9.2.1 Feature Extractionp. 280
9.2.2 Singular Value Decompositionp. 281
9.2.3 General Behaviorp. 282
9.3 LSA Feature Spacep. 283
9.3.1 Word Clusteringp. 284
9.3.2 Word Cluster Examplep. 284
9.3.3 Document Clusteringp. 285
9.3.4 Document Cluster Examplep. 286
9.4 Semantic Classificationp. 287
9.4.1 Framework Extensionp. 287
9.4.2 Semantic Inferencep. 288
9.4.3 Caveatsp. 289
9.5 N-gram+LSA Language Modelingp. 290
9.5.1 LSA Componentp. 290
9.5.2 Integration with N-gramsp. 292
9.5.3 Context Scope Selectionp. 294
9.6 Smoothingp. 295
9.6.1 Word Smoothingp. 295
9.6.2 Document Smoothingp. 296
9.6.3 Joint Smoothingp. 297
9.7 Experimentsp. 297
9.7.1 Experimental Conditionsp. 298
9.7.2 Experimental Resultsp. 298
9.7.3 Context Scope Selectionp. 299
9.8 Inherent Trade-Offsp. 300
9.8.1 Cross-Domain Trainingp. 300
9.8.2 Discussionp. 301
9.9 Conclusionp. 302
10 Semantic Information Processing of Spoken Language - How May I Help You?[superscript sm]p. 309
10.1 Introductionp. 309
10.2 Call-Classificationp. 311
10.3 Language Modeling for Recognition and Understandingp. 316
10.4 Dialogp. 318
10.5 Conclusionsp. 319
11 Machine Translation Using Statistical Modelingp. 321
11.1 Introductionp. 321
11.2 Statistical Decision Theory and Linguisticsp. 323
11.2.1 The Statistical Approachp. 323
11.2.2 Bayes Decision Rule for Written Language Translationp. 324
11.2.3 Related Approachesp. 324
11.3 Alignment and Lexicon Modelsp. 325
11.3.1 Concept of Alignment Modellingp. 325
11.3.2 Hidden Markov Modelsp. 326
11.3.3 Models IBM 1-5p. 329
11.3.4 Trainingp. 331
11.3.5 Searchp. 332
11.3.6 Algorithmic Differences between Speech Recognition and Language Translationp. 334
11.4 Alignment Templates: From Single Words to Word Groupsp. 335
11.4.1 Conceptp. 335
11.4.2 Trainingp. 337
11.4.3 Searchp. 338
11.5 Experimental Resultsp. 338
11.5.1 The Task and the Corpusp. 338
11.5.2 Offline Resultsp. 339
11.5.3 Integration into the Verbmobil Prototype Systemp. 341
11.5.4 Final Evaluationp. 341
11.6 Speech Translation: The Integrated Approachp. 343
11.6.1 Principlep. 343
11.6.2 Practical Implementationp. 344
11.7 Summaryp. 346
11.8 Referencesp. 347
12 Modeling Topics for Detection and Trackingp. 353
12.1 Topic Detection and Trackingp. 353
12.1.1 Topic and Eventsp. 354
12.1.2 TDT Tasksp. 354
12.1.3 Corporap. 356
12.1.4 Evaluationp. 357
12.2 Basic Topic Modelsp. 359
12.2.1 Vector Spacep. 359
12.2.2 Language Modelsp. 360
12.3 Implementing the Modelsp. 360
12.3.1 Named Entitiesp. 361
12.3.2 Document Expansionp. 361
12.3.3 Clusteringp. 362
12.3.4 Time Decayp. 363
12.4 Comparing Modelsp. 363
12.4.1 Nearest Neighborsp. 363
12.4.2 Decision Treesp. 364
12.4.3 Model-to-Modelp. 365
12.5 Miscellaneous Issuesp. 365
12.5.1 Deferralp. 366
12.5.2 Multi-modal Issuesp. 366
12.5.3 Multi-lingual Issuesp. 367
12.6 Using TDT Interactivelyp. 368
12.6.1 Demonstrationsp. 368
12.6.2 Timelinesp. 369
12.7 Modeling Eventsp. 370
12.8 Conclusionp. 371
Indexp. 377
Go to:Top of Page