Cover image for Computational paralinguistics : emotion, affect and personality in speech and language processing
Title:
Computational paralinguistics : emotion, affect and personality in speech and language processing
Personal Author:
Publication Information:
Hoboken, N.J. : John Wiley & Sons Inc., 2014
Physical Description:
xxi, 321 p. : ill. ; 25 cm.
ISBN:
9781119971368
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010334660 P37.5.D37 S34 2014 Open Access Book Book
Searching...

On Order

Summary

Summary

This book presents the methods, tools and techniques that are currently being used to recognise (automatically) the affect, emotion, personality and everything else beyond linguistics ('paralinguistics') expressed by or embedded in human speech and language.

It is the first book to provide such a systematic survey of paralinguistics in speech and language processing. The technology described has evolved mainly from automatic speech and speaker recognition and processing, but also takes into account recent developments within speech signal processing, machine intelligence and data mining.

Moreover, the book offers a hands-on approach by integrating actual data sets, software, and open-source utilities which will make the book invaluable as a teaching tool and similarly useful for those professionals already in the field.

Key features:

Provides an integrated presentation of basic research (in phonetics/linguistics and humanities) with state-of-the-art engineering approaches for speech signal processing and machine intelligence. Explains the history and state of the art of all of the sub-fields which contribute to the topic of computational paralinguistics. C overs the signal processing and machine learning aspects of the actual computational modelling of emotion and personality and explains the detection process from corpus collection to feature extraction and from model testing to system integration. Details aspects of real-world system integration including distribution, weakly supervised learning and confidence measures. Outlines machine learning approaches including static, dynamic and context‑sensitive algorithms for classification and regression. Includes a tutorial on freely available toolkits, such as the open-source 'openEAR' toolkit for emotion and affect recognition co-developed by one of the authors, and a listing of standard databases and feature sets used in the field to allow for immediate experimentation enabling the reader to build an emotion detection model on an existing corpus.


Author Notes

Björn Schuller , Technische Universität München, Germany

Anton Batliner , Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany


Table of Contents

Prefacep. xiii
Acknowledgementsp. xv
List of Abbreviationsp. xvii
Part I Foundations
1 Introductionp. 3
1.1 What is Computational Paralinguistics? A First Approximationp. 3
1.2 History and Subject Areap. 7
1.3 Form versus Functionp. 10
1.4 Further Aspectsp. 12
1.4.1 The Synthesis of Emotion and Personalityp. 12
1.4.2 Multimodality; Analysis and Generationp. 13
1.4.3 Applications, Usability and Ethicsp. 15
1.5 Summary and Structure of the Bookp. 17
Referencesp. 18
2 Taxonomiesp. 21
2.1 Traits versus Statesp. 21
2.2 Acted versus Spontaneousp. 25
2.3 Complex versus Simplep. 30
2.4 Measured versus Assessedp. 31
2.5 Categorical versus Continuousp. 33
2.6 Felt versus Perceivedp. 35
2.7 Intentional versus Instinctualp. 37
2.8 Consistent versus Discrepantp. 38
2.9 Private versus Socialp. 39
2.10 Prototypical versus Peripheralp. 40
2.11 Universal versus Culture-Specificp. 41
2.12 Unimodal versus Multimodalp. 43
2.13 All These Taxonomies - So What?p. 44
2.13.1 Emotion Data: The FAU AECp. 45
2.13.2 Non-native Data: The C-AuDiT corpusp. 47
Referencesp. 48
3 Aspects of Modellingp. 53
3.1 Theories and Models of Personalityp. 53
3.2 Theories and Models of Emotion and Affectp. 55
3.3 Type and Segmentation of Unitsp. 58
3.4 Typical versus Atypical Speechp. 60
3.5 Contextp. 61
3.6 Lab versus Life, or Through the Looking Glassp. 62
3.7 Sheep and Goats, or Single Instance Decision versus Cumulative Evidence and Overall Performancep. 64
3.8 The Few and the Many, or How to Analyse a Hamburgerp. 65
3.9 Reifications, and What You are Looking for is What You Getp. 67
3.10 Magical Numbers versus Sound Reasoningp. 68
Referencesp. 74
4 Formal Aspectsp. 79
4.1 The Linguistic Code and Beyondp. 79
4.2 The Non-Distinctive Use of Phonetic Elementsp. 81
4.2.1 Segmental Level: The Case of /r/ Variantsp. 81
4.2.2 Supra-segmental Level: The Case of Pitch and Fundamental Frequency - and of Other Prosodic Parametersp. 82
4.2.3 In Between: The Case of Other Voice Qualities, Especially Laryngealisationp. 86
4.3 The Non-Distinctive Use of Linguistics Elementsp. 91
4.3.1 Words and Word Classesp. 91
4.3.2 Phrase Level: The Case of Filler Phrases and Hedgesp. 94
4.4 Disfluenciesp. 96
4.5 Non-Verbal, Vocal Eventsp. 98
4.6 Common Traits of Formal Aspectsp. 100
Referencesp. 101
5 Functional Aspectsp. 107
5.1 Biological Trait Primitivesp. 109
5.1.1 Speaker Characteristicsp. 111
5.2 Cultural Trait Primitivesp. 112
5.2.1 Speech Characteristicsp. 114
5.3 Personalityp. 115
5.4 Emotion and Affectp. 119
5.5 Subjectivity and Sentiment Analysisp. 123
5.6 Deviant Speechp. 124
5.6.1 Pathological Speechp. 125
5.6.2 Temporarily Deviant Speechp. 129
5.6.3 Non-native Speechp. 130
5.7 Social Signalsp. 131
5.8 Discrepant Communicationp. 135
5.8.1 Indirect Speech, Irony, and Sarcasmp. 136
5.8.2 Deceptive Speechp. 138
5.8.3 Off-Talkp. 139
5.9 Common Traits of Functional Aspectsp. 140
Referencesp. 141
6 Corpus Engineeringp. 159
6.1 Annotationp. 160
6.1.1 Assessment of Annotationsp. 161
6.1.2 New Trendsp. 164
6.2 Corpora and Benchmarks: Some Examplesp. 164
6.2.1 FAU Aibo Emotion Corpusp. 165
6.2.2 aGender Corpusp. 165
6.2.3 TUM AVIC Corpusp. 166
6.2.4 Alcohol Language Corpusp. 168
6.2.5 Sleepy Language Corpusp. 168
6.2.6 Speaker Personality Corpusp. 169
6.2.7 Speaker Likability Databasep. 170
6.2.8 NKI CCRT Speech Corpusp. 171
6.2.9 TIMET Databasep. 171
6.2.10 Final Remarks on Databasesp. 172
Referencesp. 173
Part II Modelling
7 Computational Modelling of Paralinguistics: Overviewp. 179
Referencesp. 183
8 Acoustic Featuresp. 185
8.1 Digital Signal Representationp. 185
8.2 Short Time Analysisp. 187
8.3 Acoustic Segmentationp. 190
8.4 Continuous Descriptorsp. 190
8.4.1 Intensityp. 190
8.4.2 Zero Crossingsp. 191
8.4.3 Autocorrelationp. 192
8.4.4 Spectrum and Cepstrump. 194
8.4.5 Linear Predictionp. 198
8.4.6 Line Spectral Pairsp. 202
8.4.7 Perceptual Linear Predictionp. 203
8.4.8 Fonnantsp. 205
8.4.9 Fundamental Frequency and Voicing Probabilityp. 207
8.4.10 Jitter and Shimmerp. 212
8.4.11 Derived Low-Level Descriptorsp. 214
Referencesp. 214
9 Linguistic Featuresp. 217
9.1 Textual Descriptorsp. 217
9.2 Preprocessingp. 218
9.3 Reductionp. 218
9.3.1 Stoppingp. 218
9.3.2 Stemmingp. 219
9.3.3 Taggingp. 219
9.4 Modellingp. 220
9.4.1 Vector Space Modellingp. 220
9.4.2 On-line Knowledgep. 222
Referencesp. 227
10 Supra-segmental Featuresp. 230
10.1 Functionalsp. 231
10.2 Feature Brute-Forcingp. 232
10.3 Feature Stackingp. 233
Referencesp. 234
11 Machine-Based Modellingp. 235
11.1 Feature Relevance Analysisp. 235
11.2 Machine Learningp. 238
11.2.1 Static Classificationp. 238
11.2.2 Dynamic Classification: Hidden Markov Modelsp. 256
11.2.3 Regressionp. 262
11.3 Testing Protocolsp. 264
11.3.1 Partitioningp. 264
11.3.2 Balancingp. 266
11.3.3 Performance Measuresp. 267
11.3.4 Result Interpretationp. 272
Referencesp. 277
12 System Integration and Applicationp. 281
12.1 Distributed Processingp. 281
12.2 Autonomous and Collaborative Learningp. 284
12.3 Confidence Measuresp. 286
Referencesp. 287
13 'Hands-On': Existing Toolkits and Practical Tutorialp. 289
13.1 Related Toolkitsp. 289
13.2 openSMILEp. 290
13.2.1 Available Feature Extractorsp. 293
13.3 Practical Computational Paralinguistics How-top. 294
13.3.1 Obtaining and Installing openSMILEp. 295
13.3.2 Extracting Featuresp. 295
13.3.3 Classification and Regressionp. 302
Referencesp. 303
14 Epiloguep. 304
Appendixp. 307
A.1 openSMILE Feature Sets Used at Interspeech Challengesp. 307
A.2 Feature Encoding Schemep. 310
Referencesp. 314
Indexp. 315