Cover image for Human factors and voice interactive systems
Title:
Human factors and voice interactive systems
Series:
Signals and communication technology
Edition:
2nd ed.
Publication Information:
New York, NY : Springer, 2006
Physical Description:
xxvi, 468 p. : ill. ; 24 cm.
ISBN:
9780387254821

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010196872 TK7882.S65 H85 2006 Open Access Book Book
Searching...

On Order

Summary

Summary

The second edition of Human Factors and Voice Interactive Systems, in addition to updating chapters from the first edition, adds in-depth information on current topics of major interest to speech application developers. These topics include use of speech technologies in automobiles, speech in mobile phones, natural language dialogue issues in speech application design, and the human factors design, testing, and evaluation of interactive voice response (IVR) applications.


Table of Contents

Bernhard SuhmSusan J. BoyceOsamuyimen T. Stewart and Harry E. BlanchardAmir M. Mane and Esther LevinDragos BurileanuGeza Nemeth and Geza Kiss and Csaba Zainko and Gabor Olaszy and Balint TothHarry E. Blanchard and Steven H. LewisMatthew YuschikNicole YankelovichDimitri KanevskyMichel DivayMaria Gosy and Magdolna KovacsMichel Divay and Ed BruckertJohn C. Thomas and Sara Basson and Daryle Gardner-BonneauMaria Gosy
1 IVR Usability Engineering Using Guidelines and Analyses of End-To-End Callsp. 1
1 IVR Design Principles and Guidelinesp. 2
1.1 A Taxonomy of Limitations of Speech User Interfacesp. 3
1.1.1 Limitations of Speech Recognitionp. 4
1.1.2 Limitations of Spoken Languagep. 7
1.1.3 Human Cognitionp. 9
1.2 Towards Best Practices for IVR Designp. 10
1.2.1 A Database for Speech User Interface Design Knowledgep. 10
1.2.2 Compiling Guidelines for IVR Designp. 11
1.2.3 Applying IVR Design Guidelines in Practicep. 13
1.3 Best Practices for IVR Design?p. 18
2 Data-Driven IVR Usability Engineering Based on End-To-End Callsp. 19
2.1 The Flaws of Standard IVR Reportsp. 20
2.2 Capturing End-to-End Data from Callsp. 20
2.3 Evaluating IVR Usability based on End-to-End Callsp. 23
2.3.1 Call-reason Distributionp. 23
2.3.2 Diagnosing IVR Usability using Caller-Path Diagramsp. 24
2.3.3 IVR Usability Analysis using Call-Reason Distribution and Caller-Path Diagramsp. 27
2.4 Evaluating IVR Cost-effectivenessp. 29
2.4.1 Defining Total IVR Benefitp. 30
2.4.2 Measuring Total IVR Benefitp. 31
2.4.3 Estimating Improvement Potentialp. 34
2.4.4 Building the Business Case for IVR Redesignp. 35
3 Summary and Conclusionsp. 38
Acknowledgementsp. 39
Referencesp. 39
2 User Interface Design for Natural Language Systems: From Research to Realityp. 43
1 Introductionp. 43
1.1 What is Natural Language?p. 43
1.1.1 Natural Language for Call Routingp. 44
1.1.2 Natural Language for Form Fillingp. 45
1.1.3 The Pros and Cons of Natural Language Interfacesp. 45
1.2 What Are the Steps to Building a Natural Language Application?p. 46
1.2.1 Data Collectionp. 46
1.2.2 Annotation Guide Developmentp. 47
1.2.3 Call Flow Development and Annotationp. 48
1.2.4 Application Code and Grammar/NL Developmentp. 49
1.2.5 Testing NL Applicationsp. 49
1.2.6 Post-Deployment Tuningp. 49
1.3 When Does it Make Sense to use Natural Language?p. 50
1.3.1 Distribution of Callsp. 50
1.3.2 Characteristics of the Caller Populationp. 51
1.3.3 Evidence Obtained from Data with Existing Applicationp. 53
1.3.4 Ease of Getting to an Agentp. 53
1.3.5 Live Caller Environment Versus IVR: What is Being Replaced?p. 53
1.4 The Call Routing Taskp. 54
1.5 Design Processp. 54
1.6 Analysis of Human-to-Human Dialoguesp. 55
2 Anthropomorphism and User Expectationsp. 55
2.1 Anthropomorphism Experimentp. 56
3 Issues for Natural Dialogue Designp. 60
3.1 Initial Greetingp. 60
3.2 Confirmationsp. 60
3.3 Disambiguating an Utterancep. 61
3.4 Repromptsp. 61
3.5 Turn-takingp. 62
3.6 When to Bail Outp. 62
4 Establishing User Expectations in the Initial Greetingp. 62
4.1 Initial Greeting Experimentp. 63
5 Identifying Recognition Errors Through Confirmationsp. 66
5.1 Confirming Digit Strings in Spoken Dialogue Systemsp. 67
5.2 Confirmation of Topic in a Spoken Natural Dialogue Systemp. 69
6 Repairing Recognition Errors With Repromptsp. 72
6.1 Reprompt Experimentp. 73
7 Turn-Taking in Human-Machine Dialoguesp. 76
7.1 Caller Tolerance of System Delayp. 77
8 Summaryp. 79
Referencesp. 79
3 Linguistics and Psycholinguistics in IVR Designp. 81
1 Introductionp. 82
1.1 Speech Soundsp. 82
1.2 Grammarp. 83
1.2.1 Wordsp. 84
1.2.2 Sentencesp. 84
1.2.3 Meaningp. 85
2 ASR Grammars and Language Understandingp. 86
2.1 Morphologyp. 87
2.2 Syntaxp. 88
2.3 Semanticsp. 93
2.3.1 Synonymsp. 93
2.3.2 Polysemyp. 94
2.4 Putting it All Togetherp. 94
2.5 ASR Grammarsp. 95
2.6 Natural Language Understanding Modelsp. 97
2.6.1 The Semantic Taxonomyp. 98
2.6.2 Establishing Predicatesp. 100
3 Dialog Designp. 102
3.1 Putting it All Togetherp. 105
3.1.1 Scenario 1p. 106
3.1.2 Scenario 2p. 107
4 Consequences of Structural Simplificationp. 108
4.1 Semantic Specificityp. 111
4.2 Syntactic Specificityp. 112
Conclusionp. 113
Referencesp. 113
4 Designing the Voice User Interface for Automated Directory Assistancep. 117
1 The Business of DAp. 117
1.1 The Introduction of Automationp. 118
1.2 Early Attempts to Use Speech Recognitionp. 119
2 Issues in the Design of VUI for DAp. 121
2.1 Addressing Database Inadequaciesp. 122
2.1.1 The Solution: Automated Data Cleaningp. 123
2.2 Pronunciation of Namesp. 123
2.3 The First Questionp. 124
2.4 Finding the Localityp. 124
2.5 Confirming the Localityp. 125
2.6 Determining the Listing Typep. 126
2.7 Handling Business Requestsp. 127
2.7.1 Issues in Grammar Design for Business Listing Automationp. 127
2.7.2 Business Listings Disambiguationp. 130
2.8 Handling Residential Listingsp. 131
2.9 General Dialogue Design Issuesp. 133
3 Final Thoughtsp. 134
Referencesp. 134
5 Spoken Language Interfaces for Embedded Applicationsp. 135
1 Introductionp. 135
2 Spoken Language Interfaces Developmentp. 137
2.1 Overview. Current Trendsp. 137
2.2 Embedded Speech Applicationsp. 139
3 Embedded Speech Technologiesp. 141
3.1 Technical Constraints and Implementation Methodsp. 141
3.2 Embedded Speech Recognitionp. 143
3.3 Embedded Speech Synthesisp. 149
4 A Case Study: An Embedded TTS System Implementationp. 153
4.1 A Simplified TTS System Architecturep. 153
4.2 Implementation Issuesp. 155
5 The Future of Embedded Speech Interfacesp. 158
Referencesp. 160
6 Speech Generation in Mobile Phonesp. 163
1 Introductionp. 163
2 Speaking Telephone? What is it Good for?p. 165
3 Speech Generation Technologies in Mobile Phonesp. 166
3.1 Synthesis Technologiesp. 167
3.1.1 Limited Vocabulary Concatenationp. 167
3.1.2 Unlimited Text Reading - Text-To-Speechp. 168
3.2 Topic-Related Text Preprocessingp. 170
3.2.1 Exceptions Vocabularyp. 171
3.2.2 Complex Text Transformationp. 171
3.2.3 Language Identificationp. 174
4 How to Port Speech Synthesis on a Phone Platformp. 178
5 Limitations and Possibilities Offered by Phone Resourcesp. 181
6 Implementationsp. 183
6.1 The Mobile Phone as a Speaking Aidp. 183
6.2 An SMS-Reading Mobile Phone Applicationp. 186
Acknowledgementsp. 190
Referencesp. 190
7 Voice Messaging User Interfacep. 193
1 Introductionp. 193
2 The Touch-Tone Voice Mail user Interfacep. 196
2.1 Common Elements of Touch-tone Transactionsp. 197
2.1.1 Promptsp. 197
2.1.2 Interruptibilityp. 198
2.1.3 Time-outs and Repromptsp. 199
2.1.4 Feedbackp. 200
2.1.5 Feedback to Errorsp. 200
2.1.6 Menu Lengthp. 200
2.1.7 Mapping of Keys to Optionsp. 201
2.1.8 Global Commandsp. 201
2.1.9 Use of the "#" and "*" Keysp. 202
2.1.10 Unprompted Optionsp. 202
2.1.11 Voice and Personalityp. 203
2.2 Call Answeringp. 203
2.2.1 Call Answering Greetingsp. 206
2.3 The Subscriber Interfacep. 206
2.4 Retrieving and Manipulating Messagesp. 206
2.5 Sending Messagesp. 209
2.6 Voice Messaging User Interface Standardsp. 211
2.7 Alternative Approaches to Traditional Touch-tone Designp. 214
3 Automatic Speech Recognition and Voice Mailp. 215
4 Unified Messaging and Multimedia Mailp. 219
4.1 Fax Messagingp. 220
4.2 Viewing Voice Mailp. 221
4.3 Listening to E-mailp. 223
4.4 Putting it All Togetherp. 224
4.5 Mixed Mediap. 225
Referencesp. 226
8 Silence Locations and Durations in Dialog Managementp. 231
1 Introductionp. 231
2 Prompts and Responses in Dialog Managementp. 233
2.1 Dialog Managementp. 233
2.2 Word Selectionp. 234
2.3 Word Listsp. 234
2.4 Turn-Taking Cuesp. 236
3 Time as an Independent Variable - Dialog Modelp. 236
3.1 Definition of Termsp. 237
3.2 Examples of Usagep. 238
4 User Behaviorp. 238
4.1 Transactional Analysisp. 238
4.2 Verbal Communicationp. 239
4.3 Directed Dialogsp. 239
5 Measurementsp. 240
5.1 Barge-Inp. 241
6 Usability Testing and Resultsp. 242
6.1 Test Results - United States (early prototype)p. 244
6.2 Test Results - United States (tuned, early prototype)p. 245
6.3 Test Results - United Kingdomp. 246
6.4 Test Results - Italyp. 247
6.5 Test Results - Denmarkp. 249
7 Observations and Interpretationsp. 250
7.1 Lateral Resultsp. 250
7.2 Learning - Longitudinal Resultsp. 251
Conclusionsp. 252
Acknowledgementp. 252
Referencesp. 252
9 Using Natural Dialogs as the Basis for Speech Interface Designp. 255
1 Introductionp. 256
1.1 Motivationp. 256
1.2 Natural Dialog Studiesp. 257
2 Natural Dialog Case Studiesp. 258
2.1 Study #1: SpeechActs Calendar (speech-only, telephone-based)p. 259
2.1.1 Purpose of Applicationp. 259
2.1.2 Study Designp. 260
2.1.3 Software Designp. 262
2.1.4 Lessons Learnedp. 264
2.2 Study #2: Office Monitor (speech-only, microphone-based)p. 264
2.2.1 Purpose of Applicationp. 264
2.2.2 Study Designp. 265
2.2.3 Software Designp. 267
2.2.4 Lessons Learnedp. 269
2.3 Study #3: Automated Customer Service Representative (speech input, speech/graphical output, telephone-based)p. 269
2.3.1 Purpose of Applicationp. 269
2.3.2 Study Designp. 269
2.3.3 Software Designp. 275
2.3.4 Lessons Learnedp. 278
2.4 Study #4: Multimodal Drawing (speech/mouse/keyboard input, speech/graphical output, microphone-based)p. 278
2.4.1 Purpose of Applicationp. 278
2.4.2 Study Designp. 279
2.4.3 Software Designp. 283
2.4.4 Lessons Learnedp. 286
3 Discussionp. 286
3.1 Refining Application Requirements and Functionalityp. 286
3.2 Collecting Appropriate Vocabularyp. 287
3.3 Determining Commonly used Grammatical Constructsp. 287
3.4 Discovering Effective Interaction Patternsp. 287
3.5 Helping with Prompt and Feedback Designp. 288
3.6 Getting a Feeling for the Tone of the Conversationsp. 288
Conclusionp. 289
Acknowledgementsp. 289
Referencesp. 290
10 Telematics: Artificial Passenger and Beyondp. 291
1 Introductionp. 291
2 A Brief Overview of IBM Voice Technologiesp. 292
2.1 Conversational Interactivity for Telematicsp. 293
2.2 System Architecturep. 295
2.3 Embedded Speech Recognitionp. 297
2.4 Distributed Speech Recognitionp. 299
3 Evaluating/Predicting the Consequences of Misrecognitionsp. 300
4 Improving Voice and State Recognition Performance - Network Data Collection, Learning by Example, Adaptation of Language and Acoustic Models for Similar usersp. 303
5 Artificial Passengerp. 308
6 User Modeling Aspectsp. 315
6.1 User Modelp. 316
6.2 The Adaptive Modeling Processp. 317
6.3 The Control Processp. 318
6.4 Discussion about Time-Lagged Observables and Indicators in a Historyp. 319
7 Gesture-Based Command Interfacep. 320
8 Summaryp. 322
Acknowledgementsp. 323
Referencesp. 323
11 A Language to Write Letter-To-Sound Rules for English and Frenchp. 327
1 Introductionp. 327
2 The Historic Evolution of English and Frenchp. 329
3 The Complexity of the Conversion for English and Frenchp. 329
4 Rule Formalismp. 334
5 Examples of Rules for Englishp. 340
6 Examples of Rules for Frenchp. 345
Conclusionsp. 353
Referencesp. 354
Appendices for Frenchp. 356
Appendices for Englishp. 359
12 Virtual Sentences of Spontaneous Speech: Boundary Effects of Syntactic-Semantic-Prosodic Propertiesp. 361
1 Introductionp. 361
2 Method and Materialp. 364
2.1 Subjectsp. 364
2.2 Speech Materialp. 364
2.3 Procedurep. 365
3 Resultsp. 366
3.1 Identification of Virtual Sentences in the Normal and Filtered Speech Samplesp. 366
3.2 Pauses of the Speech Samplep. 368
3.3 Pause Perceptionp. 370
3.4 F0 Patternsp. 372
3.5 Comprehension of the Spontaneous Speech Samplep. 374
3.6 The Factor of Genderp. 375
Conclusionsp. 375
Acknowledgementsp. 377
Referencesp. 377
13 Text-to-Speech Formant Synthesis For Frenchp. 381
1 Introductionp. 381
2 Grapheme-to-Phoneme Conversionp. 382
2.1 Normalization: From Grapheme to Graphemep. 382
2.2 From Grapheme to Phonemep. 384
2.3 Exception Dictionaryp. 385
3 Prosodyp. 385
3.1 Parsing the Textp. 385
3.2 Intonationp. 386
3.3 Phoneme Durationp. 391
4 Acoustics for French Consonants and Vowelsp. 398
4.1 Vowelsp. 398
4.2 Fricatives (unvoiced:F,S,Ch; voiced: V,Z,J)p. 400
4.3 Plosives (unvoiced:P,T,K; voiced: B,D,G)p. 401
4.4 Nasals (M, N, Gn, Ng)p. 403
4.5 Liquids (L, R)p. 404
4.6 Semivowels (Y, W, Wu)p. 405
4.7 Phoneme Transitions (coarticulation effects)p. 405
4.8 Frame Generationp. 409
4.9 Conclusions for Acousticsp. 409
5 From Acoustics to Speech Signalp. 410
6 Next Generation Formant Synthesisp. 412
7 Singingp. 414
Conclusionsp. 414
Referencesp. 415
14 Accessibility and Speech Technology: Advancing Toward Universal Accessp. 417
1 Universal Access vs. Assistive Technologyp. 417
2 Predicted Enhancements and Improvements to Underlying Technologyp. 419
2.1 Social Network Analysis, Blogs, Wikis, and Social Computingp. 420
2.2 Intelligent Agentsp. 421
2.3 Learning Objectsp. 422
2.4 Cognitive Aidsp. 423
2.5 Interface Flexibility and Intelligencep. 423
3 Current Assistive Technology Applications Employing Speech Technologyp. 423
3.1 Applications Employing Automatic Speech Recognition (ASR)p. 424
3.2 Applications of Synthetic Speechp. 428
4 Human-Computer Interaction: Design and Evaluationp. 430
5 The Role of Technical Standards in Accessibilityp. 433
5.1 Standards Related to Software and Information Technology User Interfacesp. 434
5.2 Speech Application Accessibility Standardsp. 434
5.3 Accessibility Data and Accessibility Guidance for General Productsp. 437
Conclusionsp. 439
Referencesp. 440
15 Synthesized Speech Used for the Evaluation of Children's Hearing and Speech Perceptionp. 443
1 Introductionp. 443
2 The Background Theoryp. 444
3 The Production of the Synthesized Word Materialp. 447
4 Pre-Experiments for the Application of Synthesized Words for Hearing Screeningp. 449
5 Resultsp. 450
5.1 Clinical Testsp. 450
5.2 Screening Procedurep. 453
5.3 Evaluation of Acoustic-phonetic Perceptionp. 456
5.4 Children with Specific Needsp. 457
Conclusionsp. 458
Acknowledgementsp. 459
Referencesp. 459
Indexp. 461