Computational paralinguistics : emotion, affect and personality in speech and language processing

This book presents the methods, tools and techniques that are currently being used to recognise (automatically) the affect, emotion, personality and everything else beyond linguistics ('paralinguistics') expressed by or embedded in human speech and language.

It is the first book to provide such a systematic survey of paralinguistics in speech and language processing. The technology described has evolved mainly from automatic speech and speaker recognition and processing, but also takes into account recent developments within speech signal processing, machine intelligence and data mining.

Moreover, the book offers a hands-on approach by integrating actual data sets, software, and open-source utilities which will make the book invaluable as a teaching tool and similarly useful for those professionals already in the field.

Key features:

Provides an integrated presentation of basic research (in phonetics/linguistics and humanities) with state-of-the-art engineering approaches for speech signal processing and machine intelligence. Explains the history and state of the art of all of the sub-fields which contribute to the topic of computational paralinguistics. C overs the signal processing and machine learning aspects of the actual computational modelling of emotion and personality and explains the detection process from corpus collection to feature extraction and from model testing to system integration. Details aspects of real-world system integration including distribution, weakly supervised learning and confidence measures. Outlines machine learning approaches including static, dynamic and context‑sensitive algorithms for classification and regression. Includes a tutorial on freely available toolkits, such as the open-source 'openEAR' toolkit for emotion and affect recognition co-developed by one of the authors, and a listing of standard databases and feature sets used in the field to allow for immediate experimentation enabling the reader to build an emotion detection model on an existing corpus.

Author Notes

Björn Schuller , Technische Universität München, Germany

Anton Batliner , Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

Preface	p. xiii
Acknowledgements	p. xv
List of Abbreviations	p. xvii
Part I Foundations
1 Introduction	p. 3
1.1 What is Computational Paralinguistics? A First Approximation	p. 3
1.2 History and Subject Area	p. 7
1.3 Form versus Function	p. 10
1.4 Further Aspects	p. 12
1.4.1 The Synthesis of Emotion and Personality	p. 12
1.4.2 Multimodality; Analysis and Generation	p. 13
1.4.3 Applications, Usability and Ethics	p. 15
1.5 Summary and Structure of the Book	p. 17
References	p. 18
2 Taxonomies	p. 21
2.1 Traits versus States	p. 21
2.2 Acted versus Spontaneous	p. 25
2.3 Complex versus Simple	p. 30
2.4 Measured versus Assessed	p. 31
2.5 Categorical versus Continuous	p. 33
2.6 Felt versus Perceived	p. 35
2.7 Intentional versus Instinctual	p. 37
2.8 Consistent versus Discrepant	p. 38
2.9 Private versus Social	p. 39
2.10 Prototypical versus Peripheral	p. 40
2.11 Universal versus Culture-Specific	p. 41
2.12 Unimodal versus Multimodal	p. 43
2.13 All These Taxonomies - So What?	p. 44
2.13.1 Emotion Data: The FAU AEC	p. 45
2.13.2 Non-native Data: The C-AuDiT corpus	p. 47
References	p. 48
3 Aspects of Modelling	p. 53
3.1 Theories and Models of Personality	p. 53
3.2 Theories and Models of Emotion and Affect	p. 55
3.3 Type and Segmentation of Units	p. 58
3.4 Typical versus Atypical Speech	p. 60
3.5 Context	p. 61
3.6 Lab versus Life, or Through the Looking Glass	p. 62
3.7 Sheep and Goats, or Single Instance Decision versus Cumulative Evidence and Overall Performance	p. 64
3.8 The Few and the Many, or How to Analyse a Hamburger	p. 65
3.9 Reifications, and What You are Looking for is What You Get	p. 67
3.10 Magical Numbers versus Sound Reasoning	p. 68
References	p. 74
4 Formal Aspects	p. 79
4.1 The Linguistic Code and Beyond	p. 79
4.2 The Non-Distinctive Use of Phonetic Elements	p. 81
4.2.1 Segmental Level: The Case of /r/ Variants	p. 81
4.2.2 Supra-segmental Level: The Case of Pitch and Fundamental Frequency - and of Other Prosodic Parameters	p. 82
4.2.3 In Between: The Case of Other Voice Qualities, Especially Laryngealisation	p. 86
4.3 The Non-Distinctive Use of Linguistics Elements	p. 91
4.3.1 Words and Word Classes	p. 91
4.3.2 Phrase Level: The Case of Filler Phrases and Hedges	p. 94
4.4 Disfluencies	p. 96
4.5 Non-Verbal, Vocal Events	p. 98
4.6 Common Traits of Formal Aspects	p. 100
References	p. 101
5 Functional Aspects	p. 107
5.1 Biological Trait Primitives	p. 109
5.1.1 Speaker Characteristics	p. 111
5.2 Cultural Trait Primitives	p. 112
5.2.1 Speech Characteristics	p. 114
5.3 Personality	p. 115
5.4 Emotion and Affect	p. 119
5.5 Subjectivity and Sentiment Analysis	p. 123
5.6 Deviant Speech	p. 124
5.6.1 Pathological Speech	p. 125
5.6.2 Temporarily Deviant Speech	p. 129
5.6.3 Non-native Speech	p. 130
5.7 Social Signals	p. 131
5.8 Discrepant Communication	p. 135
5.8.1 Indirect Speech, Irony, and Sarcasm	p. 136
5.8.2 Deceptive Speech	p. 138
5.8.3 Off-Talk	p. 139
5.9 Common Traits of Functional Aspects	p. 140
References	p. 141
6 Corpus Engineering	p. 159
6.1 Annotation	p. 160
6.1.1 Assessment of Annotations	p. 161
6.1.2 New Trends	p. 164
6.2 Corpora and Benchmarks: Some Examples	p. 164
6.2.1 FAU Aibo Emotion Corpus	p. 165
6.2.2 aGender Corpus	p. 165
6.2.3 TUM AVIC Corpus	p. 166
6.2.4 Alcohol Language Corpus	p. 168
6.2.5 Sleepy Language Corpus	p. 168
6.2.6 Speaker Personality Corpus	p. 169
6.2.7 Speaker Likability Database	p. 170
6.2.8 NKI CCRT Speech Corpus	p. 171
6.2.9 TIMET Database	p. 171
6.2.10 Final Remarks on Databases	p. 172
References	p. 173
Part II Modelling
7 Computational Modelling of Paralinguistics: Overview	p. 179
References	p. 183
8 Acoustic Features	p. 185
8.1 Digital Signal Representation	p. 185
8.2 Short Time Analysis	p. 187
8.3 Acoustic Segmentation	p. 190
8.4 Continuous Descriptors	p. 190
8.4.1 Intensity	p. 190
8.4.2 Zero Crossings	p. 191
8.4.3 Autocorrelation	p. 192
8.4.4 Spectrum and Cepstrum	p. 194
8.4.5 Linear Prediction	p. 198
8.4.6 Line Spectral Pairs	p. 202
8.4.7 Perceptual Linear Prediction	p. 203
8.4.8 Fonnants	p. 205
8.4.9 Fundamental Frequency and Voicing Probability	p. 207
8.4.10 Jitter and Shimmer	p. 212
8.4.11 Derived Low-Level Descriptors	p. 214
References	p. 214
9 Linguistic Features	p. 217
9.1 Textual Descriptors	p. 217
9.2 Preprocessing	p. 218
9.3 Reduction	p. 218
9.3.1 Stopping	p. 218
9.3.2 Stemming	p. 219
9.3.3 Tagging	p. 219
9.4 Modelling	p. 220
9.4.1 Vector Space Modelling	p. 220
9.4.2 On-line Knowledge	p. 222
References	p. 227
10 Supra-segmental Features	p. 230
10.1 Functionals	p. 231
10.2 Feature Brute-Forcing	p. 232
10.3 Feature Stacking	p. 233
References	p. 234
11 Machine-Based Modelling	p. 235
11.1 Feature Relevance Analysis	p. 235
11.2 Machine Learning	p. 238
11.2.1 Static Classification	p. 238
11.2.2 Dynamic Classification: Hidden Markov Models	p. 256
11.2.3 Regression	p. 262
11.3 Testing Protocols	p. 264
11.3.1 Partitioning	p. 264
11.3.2 Balancing	p. 266
11.3.3 Performance Measures	p. 267
11.3.4 Result Interpretation	p. 272
References	p. 277
12 System Integration and Application	p. 281
12.1 Distributed Processing	p. 281
12.2 Autonomous and Collaborative Learning	p. 284
12.3 Confidence Measures	p. 286
References	p. 287
13 'Hands-On': Existing Toolkits and Practical Tutorial	p. 289
13.1 Related Toolkits	p. 289
13.2 openSMILE	p. 290
13.2.1 Available Feature Extractors	p. 293
13.3 Practical Computational Paralinguistics How-to	p. 294
13.3.1 Obtaining and Installing openSMILE	p. 295
13.3.2 Extracting Features	p. 295
13.3.3 Classification and Regression	p. 302
References	p. 303
14 Epilogue	p. 304
Appendix	p. 307
A.1 openSMILE Feature Sets Used at Interspeech Challenges	p. 307
A.2 Feature Encoding Scheme	p. 310
References	p. 314
Index	p. 315

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents