Cover image for Multimodal signal processing : theory and applications for human-computer interaction
Title:
Multimodal signal processing : theory and applications for human-computer interaction
Series:
EURASIP and Academic Press series in signal and image processing
Publication Information:
New York, NY : Elsevier, 2010
Physical Description:
xiv, 328 p. : ill. ; 24 cm.
ISBN:
9780123748256

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010218500 QA76.9.H85 M84 2010 Open Access Book Book
Searching...

On Order

Summary

Summary

Multimodal signal processing is an important research and development field that processes signals and combines information from a variety of modalities - speech, vision, language, text - which significantly enhance the understanding, modelling, and performance of human-computer interaction devices or systems enhancing human-human communication. The overarching theme of this book is the application of signal processing and statistical machine learning techniques to problems arising in this multi-disciplinary field. It describes the capabilities and limitations of current technologies, and discusses the technical challenges that must be overcome to develop efficient and user-friendly multimodal interactive systems.

With contributions from the leading experts in the field, the present book should serve as a reference in multimodal signal processing for signal processing researchers, graduate students, R&D engineers, and computer engineers who are interested in this emerging field.


Author Notes

Jean-Philippe Thiran received his PhD from the Universit Catholique de Louvain (UCL) in 1997. He is Assistant Professor at the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, responsible for the image analysis group. Dr Thiran's current scientific interests include image segmentation, multimodal signal processing and medical image analysis.
Ferran Marqus is Full Professor in the TSC Department of Universitat Polytcnica di Catalunya (UPC) where he is lecturing on the area of digital signal and image processing. He has previously held posts at EPFL and the University of Southern California. He received his PhD from UPC in December 1992.
Herv Bourlard is Director of the Idiap Research Institute, Full Professor at the Swiss Federal Institute of Technology at Lausanne(EPFL), and Director of a National Centre of Competence in Research on 'Interactive Multimodal Information Management. His current interests mainly include statistical pattern classification, signal processing, multi-channel processing, artificial neural networks, and applied mathematics.


Table of Contents

Jean-Philippe Thiran and Ferran Marqués and Hervé BourlardSamy BengioThierry Dutoit and Stéphane DupontOlivier PietquinMontse Pardàs and Verónica Vilaplana and Cristian Canton-FerrerClaus VielhauerMihai Curban and Jean-Philippe ThiranNorman Poh and Josef KittlerMihai Gurban and jean-Philippe ThiranKonstantinos Moustakas and Savvas Argyropoulos and Dimitrios TzovarasAndrei Popescu-BelisNatalie Ruiz and Fang Chen and Sharon OviattIgor S. Pand ićStéphane Marchand-Maillet and Donn Morrison and Enikö Szekely and Eric BrunoDaniel Catica-Perez
Prefacep. xiii
1 Introductionp. 1
Part I Signal Processing, Modelling and Related Mathematical Toolsp. 5
2 Statistical Machine Learning for HCIp. 7
2.1 Introductionp. 7
2.2 Introduction to Statistical Learningp. 8
2.2.1 Types of Problemp. 8
2.2.2 Function Spacep. 9
2.2.3 Loss Functionsp. 10
2.2.4 Expected Risk and Empirical Riskp. 10
2.2.5 Statistical Learning Theoryp. 11
2.3 Support Vector Machines for Binary Classificationp. 13
2.4 Hidden Markov Models for Speech Recognitionp. 16
2.4.1 Speech Recognitionp. 17
2.4.2 Markovian Processesp. 17
2.4.3 Hidden Markov Modelsp. 18
2.4.4 Inference and Learning with HMMsp. 20
2.4.5 HMMs for Speech Recognitionp. 22
2.5 Conclusionp. 22
Referencesp. 23
3 Speech Processingp. 25
3.1 Introductionp. 26
3.2 Speech Recognitionp. 28
3.2.1 Feature Extractionp. 28
3.2.2 Acoustic Modellingp. 30
3.2.3 Language Modellingp. 33
3.2.4 Decodingp. 34
3.2.5 Multiple Sensorsp. 35
3.2.6 Confidence Measuresp. 37
3.2.7 Robustnessp. 38
3.3 Speaker Recognitionp. 40
3.3.1 Overviewp. 40
3.3.2 Robustnessp. 43
3.4 Text-to-Speech Synthesisp. 44
3.4.1 Natural Language Processing for Speech Synthesisp. 44
3.4.2 Concatenative Synthesis with a Fixed Inventoryp. 46
3.4.3 Unit Selection-Based Synthesisp. 50
3.4.4 Statistical Parametric Synthesisp. 53
3.5 Conclusionsp. 56
Referencesp. 57
4 Natural Language and Dialogue Processingp. 63
4.1 Introductionp. 63
4.2 Natural Language Understandingp. 64
4.2.1 Syntactic Parsingp. 64
4.2.2 Semantic Parsingp. 68
4.2.3 Contextual Interpretationp. 70
4.3 Natural Language Generationp. 71
4.3.1 Document Planningp. 72
4.3.2 Microplanningp. 73
4.3.3 Surface Realisationp. 73
4.4 Dialogue Processingp. 74
4.4.1 Discourse Modellingp. 74
4.4.2 Dialogue Managementp. 77
4.4.3 Degrees of Initiativep. 80
4.4.4 Evaluationp. 81
4.5 Conclusionp. 85
Referencesp. 85
5 Image and Video Processing Tools for HCIp. 93
5.1 Introductionp. 93
5.2 Face Analysesp. 94
5.2.1 Face Detectionp. 95
5.2.2 Face Trackingp. 96
5.2.3 Facial Feature Detection and Trackingp. 98
5.2.4 Gaze Analysisp. 100
5.2.5 Face Recognitionp. 101
5.2.6 Facial Expression Recognitionp. 103
5.3 Hand-Gesture Analysisp. 104
5.4 Head Orientation Analysis and FoA Estimationp. 106
5.4.1 Head Orientation Analysisp. 106
5.4.2 Focus of Attention Estimationp. 107
5.5 Body Gesture Analysisp. 109
5.6 Conclusionsp. 112
Referencesp. 112
6 Processing of Handwriting and Sketching Dynamicsp. 119
6.1 Introductionp. 119
6.2 History of Handwriting Modality and the Acquisition of Online Handwriting Signalsp. 121
6.3 Basics in Acquisition, Examples for Sensorsp. 123
6.4 Analysis of Online Handwriting and Sketching Signalsp. 124
6.5 Overview of Recognition Goals in HCIp. 125
6.6 Sketch Recognition for User Interface Designp. 128
6.7 Similarity Search in Digital Inkp. 133
6.8 Summary and Perspectives for Handwriting and Sketching in HCIp. 138
Referencesp. 139
Part II Multimodal Signal Processing and Modellingp. 143
7 Basic Concepts of Multimodal Analysisp. 143
7.1 Defining Multimodalityp. 145
7.2 Advantages of Multimodal Analysisp. 148
7.3 Conclusionp. 151
Referencesp. 152
8 Multimodal Information Fusionp. 153
8.1 Introductionp. 153
8.2 Levels of Fusionp. 156
8.3 Adaptive versus Non-Adaptive Fusionp. 158
8.4 Other Design Issuesp. 162
8.5 Conclusionsp. 165
Referencesp. 165
9 Modality Integration Methodsp. 171
9.1 Introductionp. 171
9.2 Multimodal Fusion for AVSRp. 172
9.2.1 Types of Fusionp. 172
9.2.2 Multistream HMMsp. 174
9.2.3 Stream Reliability Estimatesp. 174
9.3 Multimodal Speaker Localisationp. 178
9.4 Conclusionp. 181
Referencesp. 181
10 A Multimodal Recognition Framework for Joint Modality Compensation and Fusionp. 185
10.1 Introductionp. 186
10.2 Joint Modality Recognition and Applicationsp. 188
10.3 A New Joint Modality Recognition Schemep. 191
10.3.1 Conceptp. 191
10.3.2 Theoretical Backgroundp. 191
10.4 Joint Modality Audio-Visual Speech Recognitionp. 194
10.4.1 Signature Extraction Stagep. 196
10.4.2 Recognition Stagep. 197
10.5 Joint Modality Recognition in Biometricsp. 198
10.5.1 Overviewp. 198
10.5.2 Resultsp. 199
10.6 Conclusionsp. 203
References|204
11 Managing Multimodal Data, Metadata and Annotations: Challenges and Solutionsp. 207
11.1 Introductionp. 208
11.2 Setting the Stage: Concepts and Projectsp. 208
11.2.l Metadate-versusAnnotationsp. 209
11.2.2 Examples of Large Multimodal Collectionsp. 210
11.3 Capturing and Recording Multimodal Datap. 211
11.3.1 Capture Devicesp. 211
11.3.2 Synchronisationp. 212
11.3.3 Activity Types in Multimodal Corporap. 213
11.3.4 Examples of Set-ups and Raw Datap. 213
11.4 Reference Metadata and Annotationsp. 214
11.4.1 Gathering Metadata: Methodsp. 215
11.4.2 Metadata for the AMI Corpusp. 216
11.4.3 Reference Annotations: Procedure and Toolsp. 217
11.5 Data Storage and Accessp. 219
111.5.1 Exchange Formats for Metadata and Annotationsp. 219
111.5.2 Data Serversp. 221
111.5.3 Accessing Annotated Multimodal Datap. 222
11.6 Conclusions and Perspectivesp. 223
Referencesp. 224
Part III Multimodal Human-Computer and Human-to-Human Interactionp. 229
12 Multimodal Inputp. 231
12.1 Introductionp. 231
12.2 Advantages of Multimodal Input Interfacesp. 232
12.2.1 State-of-the-Art Multimodal Input Systemsp. 234
12.3 Multimodality, Cognition and Performancep. 237
12.3.1 Multimodal Perception and Cognitionp. 237
12.3.2 Cognitive Load and Performancep. 238
12.4 Understanding Multimodal Input Behaviourp. 239
12.4.1 Theoretical Frameworksp. 240
12.4.2 Interpretation of Multimodal Input Patternsp. 243
12.5 Adaptive Multimodal Interfacesp. 245
12.5.1 Designing Multimodal Interfaces that Manage Users' Cognitive Loadp. 246
12.5.2 Designing Low-Load Multimodal Interfaces for Educationp. 248
12.6 Conclusions and Future Directionsp. 250
Referencesp. 251
13 MuItimodal Output: Facial Motion, Gestures and Synthesised Speech Synchronisationp. 257
13.1 Introductionp. 257
13.2 Basic AV Speech Synthesisp. 258
13.3 The Animation Systemp. 260
13.4 Coarticulationp. 263
13.5 Extended AV Speech Synthesisp. 264
13.5.1 Data-Driven Approachesp. 267
13.5.2 Rule-Based Approachesp. 269
13.6 Embodied Conversational Agentsp. 270
13.7 TTS Timing Issuesp. 272
13.7.1 On-the-Fly Synchronisationp. 272
13.7.2 A Priori Synchronisationp. 273
13.8 Conclusionp. 274
Referencesp. 274
14 Interactive Representations of Multimodal Databasesp. 279
14.1 Introductionp. 279
14.2 Multimodal Data Representationp. 280
14.3 Multimodal Data Accessp. 283
14.3.1 Browsing as Extension of the Query Formulation Mechanismp. 283
14.3.2 Browsing for the Exploration of the Content Spacep. 287
14.3.3 Alternative Representationsp. 292
14.3.4 Evaluationp. 292
14.3.5 Commercial Impactp. 293
14.4 Gaining Semantic from User Interactionp. 294
14.4.1 Multimodal Interactive Retrievalp. 294
14.4.2 Crowdsourcingp. 295
14.5 Conclusion and Discussionp. 298
Referencesp. 299
15 Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviourp. 309
15.1 Introductionp. 309
15.2 Perspectives on Interest Modellingp. 311
15.3 Computing Interest from Audio Cuesp. 315
15.4 Computing interest from Multimodal Cuesp. 318
15.5 Other Concepts Related to Interestp. 320
15.6 Concluding Remarksp. 322
Referencesp. 323
Indexp. 327