Title:
Human factors and voice interactive systems
Series:
Signals and communication technology
Edition:
2nd ed.
Publication Information:
New York, NY : Springer, 2006
Physical Description:
xxvi, 468 p. : ill. ; 24 cm.
ISBN:
9780387254821
Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010196872 | TK7882.S65 H85 2006 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
The second edition of Human Factors and Voice Interactive Systems, in addition to updating chapters from the first edition, adds in-depth information on current topics of major interest to speech application developers. These topics include use of speech technologies in automobiles, speech in mobile phones, natural language dialogue issues in speech application design, and the human factors design, testing, and evaluation of interactive voice response (IVR) applications.
Table of Contents
1 IVR Usability Engineering Using Guidelines and Analyses of End-To-End Calls | p. 1 |
1 IVR Design Principles and Guidelines | p. 2 |
1.1 A Taxonomy of Limitations of Speech User Interfaces | p. 3 |
1.1.1 Limitations of Speech Recognition | p. 4 |
1.1.2 Limitations of Spoken Language | p. 7 |
1.1.3 Human Cognition | p. 9 |
1.2 Towards Best Practices for IVR Design | p. 10 |
1.2.1 A Database for Speech User Interface Design Knowledge | p. 10 |
1.2.2 Compiling Guidelines for IVR Design | p. 11 |
1.2.3 Applying IVR Design Guidelines in Practice | p. 13 |
1.3 Best Practices for IVR Design? | p. 18 |
2 Data-Driven IVR Usability Engineering Based on End-To-End Calls | p. 19 |
2.1 The Flaws of Standard IVR Reports | p. 20 |
2.2 Capturing End-to-End Data from Calls | p. 20 |
2.3 Evaluating IVR Usability based on End-to-End Calls | p. 23 |
2.3.1 Call-reason Distribution | p. 23 |
2.3.2 Diagnosing IVR Usability using Caller-Path Diagrams | p. 24 |
2.3.3 IVR Usability Analysis using Call-Reason Distribution and Caller-Path Diagrams | p. 27 |
2.4 Evaluating IVR Cost-effectiveness | p. 29 |
2.4.1 Defining Total IVR Benefit | p. 30 |
2.4.2 Measuring Total IVR Benefit | p. 31 |
2.4.3 Estimating Improvement Potential | p. 34 |
2.4.4 Building the Business Case for IVR Redesign | p. 35 |
3 Summary and Conclusions | p. 38 |
Acknowledgements | p. 39 |
References | p. 39 |
2 User Interface Design for Natural Language Systems: From Research to Reality | p. 43 |
1 Introduction | p. 43 |
1.1 What is Natural Language? | p. 43 |
1.1.1 Natural Language for Call Routing | p. 44 |
1.1.2 Natural Language for Form Filling | p. 45 |
1.1.3 The Pros and Cons of Natural Language Interfaces | p. 45 |
1.2 What Are the Steps to Building a Natural Language Application? | p. 46 |
1.2.1 Data Collection | p. 46 |
1.2.2 Annotation Guide Development | p. 47 |
1.2.3 Call Flow Development and Annotation | p. 48 |
1.2.4 Application Code and Grammar/NL Development | p. 49 |
1.2.5 Testing NL Applications | p. 49 |
1.2.6 Post-Deployment Tuning | p. 49 |
1.3 When Does it Make Sense to use Natural Language? | p. 50 |
1.3.1 Distribution of Calls | p. 50 |
1.3.2 Characteristics of the Caller Population | p. 51 |
1.3.3 Evidence Obtained from Data with Existing Application | p. 53 |
1.3.4 Ease of Getting to an Agent | p. 53 |
1.3.5 Live Caller Environment Versus IVR: What is Being Replaced? | p. 53 |
1.4 The Call Routing Task | p. 54 |
1.5 Design Process | p. 54 |
1.6 Analysis of Human-to-Human Dialogues | p. 55 |
2 Anthropomorphism and User Expectations | p. 55 |
2.1 Anthropomorphism Experiment | p. 56 |
3 Issues for Natural Dialogue Design | p. 60 |
3.1 Initial Greeting | p. 60 |
3.2 Confirmations | p. 60 |
3.3 Disambiguating an Utterance | p. 61 |
3.4 Reprompts | p. 61 |
3.5 Turn-taking | p. 62 |
3.6 When to Bail Out | p. 62 |
4 Establishing User Expectations in the Initial Greeting | p. 62 |
4.1 Initial Greeting Experiment | p. 63 |
5 Identifying Recognition Errors Through Confirmations | p. 66 |
5.1 Confirming Digit Strings in Spoken Dialogue Systems | p. 67 |
5.2 Confirmation of Topic in a Spoken Natural Dialogue System | p. 69 |
6 Repairing Recognition Errors With Reprompts | p. 72 |
6.1 Reprompt Experiment | p. 73 |
7 Turn-Taking in Human-Machine Dialogues | p. 76 |
7.1 Caller Tolerance of System Delay | p. 77 |
8 Summary | p. 79 |
References | p. 79 |
3 Linguistics and Psycholinguistics in IVR Design | p. 81 |
1 Introduction | p. 82 |
1.1 Speech Sounds | p. 82 |
1.2 Grammar | p. 83 |
1.2.1 Words | p. 84 |
1.2.2 Sentences | p. 84 |
1.2.3 Meaning | p. 85 |
2 ASR Grammars and Language Understanding | p. 86 |
2.1 Morphology | p. 87 |
2.2 Syntax | p. 88 |
2.3 Semantics | p. 93 |
2.3.1 Synonyms | p. 93 |
2.3.2 Polysemy | p. 94 |
2.4 Putting it All Together | p. 94 |
2.5 ASR Grammars | p. 95 |
2.6 Natural Language Understanding Models | p. 97 |
2.6.1 The Semantic Taxonomy | p. 98 |
2.6.2 Establishing Predicates | p. 100 |
3 Dialog Design | p. 102 |
3.1 Putting it All Together | p. 105 |
3.1.1 Scenario 1 | p. 106 |
3.1.2 Scenario 2 | p. 107 |
4 Consequences of Structural Simplification | p. 108 |
4.1 Semantic Specificity | p. 111 |
4.2 Syntactic Specificity | p. 112 |
Conclusion | p. 113 |
References | p. 113 |
4 Designing the Voice User Interface for Automated Directory Assistance | p. 117 |
1 The Business of DA | p. 117 |
1.1 The Introduction of Automation | p. 118 |
1.2 Early Attempts to Use Speech Recognition | p. 119 |
2 Issues in the Design of VUI for DA | p. 121 |
2.1 Addressing Database Inadequacies | p. 122 |
2.1.1 The Solution: Automated Data Cleaning | p. 123 |
2.2 Pronunciation of Names | p. 123 |
2.3 The First Question | p. 124 |
2.4 Finding the Locality | p. 124 |
2.5 Confirming the Locality | p. 125 |
2.6 Determining the Listing Type | p. 126 |
2.7 Handling Business Requests | p. 127 |
2.7.1 Issues in Grammar Design for Business Listing Automation | p. 127 |
2.7.2 Business Listings Disambiguation | p. 130 |
2.8 Handling Residential Listings | p. 131 |
2.9 General Dialogue Design Issues | p. 133 |
3 Final Thoughts | p. 134 |
References | p. 134 |
5 Spoken Language Interfaces for Embedded Applications | p. 135 |
1 Introduction | p. 135 |
2 Spoken Language Interfaces Development | p. 137 |
2.1 Overview. Current Trends | p. 137 |
2.2 Embedded Speech Applications | p. 139 |
3 Embedded Speech Technologies | p. 141 |
3.1 Technical Constraints and Implementation Methods | p. 141 |
3.2 Embedded Speech Recognition | p. 143 |
3.3 Embedded Speech Synthesis | p. 149 |
4 A Case Study: An Embedded TTS System Implementation | p. 153 |
4.1 A Simplified TTS System Architecture | p. 153 |
4.2 Implementation Issues | p. 155 |
5 The Future of Embedded Speech Interfaces | p. 158 |
References | p. 160 |
6 Speech Generation in Mobile Phones | p. 163 |
1 Introduction | p. 163 |
2 Speaking Telephone? What is it Good for? | p. 165 |
3 Speech Generation Technologies in Mobile Phones | p. 166 |
3.1 Synthesis Technologies | p. 167 |
3.1.1 Limited Vocabulary Concatenation | p. 167 |
3.1.2 Unlimited Text Reading - Text-To-Speech | p. 168 |
3.2 Topic-Related Text Preprocessing | p. 170 |
3.2.1 Exceptions Vocabulary | p. 171 |
3.2.2 Complex Text Transformation | p. 171 |
3.2.3 Language Identification | p. 174 |
4 How to Port Speech Synthesis on a Phone Platform | p. 178 |
5 Limitations and Possibilities Offered by Phone Resources | p. 181 |
6 Implementations | p. 183 |
6.1 The Mobile Phone as a Speaking Aid | p. 183 |
6.2 An SMS-Reading Mobile Phone Application | p. 186 |
Acknowledgements | p. 190 |
References | p. 190 |
7 Voice Messaging User Interface | p. 193 |
1 Introduction | p. 193 |
2 The Touch-Tone Voice Mail user Interface | p. 196 |
2.1 Common Elements of Touch-tone Transactions | p. 197 |
2.1.1 Prompts | p. 197 |
2.1.2 Interruptibility | p. 198 |
2.1.3 Time-outs and Reprompts | p. 199 |
2.1.4 Feedback | p. 200 |
2.1.5 Feedback to Errors | p. 200 |
2.1.6 Menu Length | p. 200 |
2.1.7 Mapping of Keys to Options | p. 201 |
2.1.8 Global Commands | p. 201 |
2.1.9 Use of the "#" and "*" Keys | p. 202 |
2.1.10 Unprompted Options | p. 202 |
2.1.11 Voice and Personality | p. 203 |
2.2 Call Answering | p. 203 |
2.2.1 Call Answering Greetings | p. 206 |
2.3 The Subscriber Interface | p. 206 |
2.4 Retrieving and Manipulating Messages | p. 206 |
2.5 Sending Messages | p. 209 |
2.6 Voice Messaging User Interface Standards | p. 211 |
2.7 Alternative Approaches to Traditional Touch-tone Design | p. 214 |
3 Automatic Speech Recognition and Voice Mail | p. 215 |
4 Unified Messaging and Multimedia Mail | p. 219 |
4.1 Fax Messaging | p. 220 |
4.2 Viewing Voice Mail | p. 221 |
4.3 Listening to E-mail | p. 223 |
4.4 Putting it All Together | p. 224 |
4.5 Mixed Media | p. 225 |
References | p. 226 |
8 Silence Locations and Durations in Dialog Management | p. 231 |
1 Introduction | p. 231 |
2 Prompts and Responses in Dialog Management | p. 233 |
2.1 Dialog Management | p. 233 |
2.2 Word Selection | p. 234 |
2.3 Word Lists | p. 234 |
2.4 Turn-Taking Cues | p. 236 |
3 Time as an Independent Variable - Dialog Model | p. 236 |
3.1 Definition of Terms | p. 237 |
3.2 Examples of Usage | p. 238 |
4 User Behavior | p. 238 |
4.1 Transactional Analysis | p. 238 |
4.2 Verbal Communication | p. 239 |
4.3 Directed Dialogs | p. 239 |
5 Measurements | p. 240 |
5.1 Barge-In | p. 241 |
6 Usability Testing and Results | p. 242 |
6.1 Test Results - United States (early prototype) | p. 244 |
6.2 Test Results - United States (tuned, early prototype) | p. 245 |
6.3 Test Results - United Kingdom | p. 246 |
6.4 Test Results - Italy | p. 247 |
6.5 Test Results - Denmark | p. 249 |
7 Observations and Interpretations | p. 250 |
7.1 Lateral Results | p. 250 |
7.2 Learning - Longitudinal Results | p. 251 |
Conclusions | p. 252 |
Acknowledgement | p. 252 |
References | p. 252 |
9 Using Natural Dialogs as the Basis for Speech Interface Design | p. 255 |
1 Introduction | p. 256 |
1.1 Motivation | p. 256 |
1.2 Natural Dialog Studies | p. 257 |
2 Natural Dialog Case Studies | p. 258 |
2.1 Study #1: SpeechActs Calendar (speech-only, telephone-based) | p. 259 |
2.1.1 Purpose of Application | p. 259 |
2.1.2 Study Design | p. 260 |
2.1.3 Software Design | p. 262 |
2.1.4 Lessons Learned | p. 264 |
2.2 Study #2: Office Monitor (speech-only, microphone-based) | p. 264 |
2.2.1 Purpose of Application | p. 264 |
2.2.2 Study Design | p. 265 |
2.2.3 Software Design | p. 267 |
2.2.4 Lessons Learned | p. 269 |
2.3 Study #3: Automated Customer Service Representative (speech input, speech/graphical output, telephone-based) | p. 269 |
2.3.1 Purpose of Application | p. 269 |
2.3.2 Study Design | p. 269 |
2.3.3 Software Design | p. 275 |
2.3.4 Lessons Learned | p. 278 |
2.4 Study #4: Multimodal Drawing (speech/mouse/keyboard input, speech/graphical output, microphone-based) | p. 278 |
2.4.1 Purpose of Application | p. 278 |
2.4.2 Study Design | p. 279 |
2.4.3 Software Design | p. 283 |
2.4.4 Lessons Learned | p. 286 |
3 Discussion | p. 286 |
3.1 Refining Application Requirements and Functionality | p. 286 |
3.2 Collecting Appropriate Vocabulary | p. 287 |
3.3 Determining Commonly used Grammatical Constructs | p. 287 |
3.4 Discovering Effective Interaction Patterns | p. 287 |
3.5 Helping with Prompt and Feedback Design | p. 288 |
3.6 Getting a Feeling for the Tone of the Conversations | p. 288 |
Conclusion | p. 289 |
Acknowledgements | p. 289 |
References | p. 290 |
10 Telematics: Artificial Passenger and Beyond | p. 291 |
1 Introduction | p. 291 |
2 A Brief Overview of IBM Voice Technologies | p. 292 |
2.1 Conversational Interactivity for Telematics | p. 293 |
2.2 System Architecture | p. 295 |
2.3 Embedded Speech Recognition | p. 297 |
2.4 Distributed Speech Recognition | p. 299 |
3 Evaluating/Predicting the Consequences of Misrecognitions | p. 300 |
4 Improving Voice and State Recognition Performance - Network Data Collection, Learning by Example, Adaptation of Language and Acoustic Models for Similar users | p. 303 |
5 Artificial Passenger | p. 308 |
6 User Modeling Aspects | p. 315 |
6.1 User Model | p. 316 |
6.2 The Adaptive Modeling Process | p. 317 |
6.3 The Control Process | p. 318 |
6.4 Discussion about Time-Lagged Observables and Indicators in a History | p. 319 |
7 Gesture-Based Command Interface | p. 320 |
8 Summary | p. 322 |
Acknowledgements | p. 323 |
References | p. 323 |
11 A Language to Write Letter-To-Sound Rules for English and French | p. 327 |
1 Introduction | p. 327 |
2 The Historic Evolution of English and French | p. 329 |
3 The Complexity of the Conversion for English and French | p. 329 |
4 Rule Formalism | p. 334 |
5 Examples of Rules for English | p. 340 |
6 Examples of Rules for French | p. 345 |
Conclusions | p. 353 |
References | p. 354 |
Appendices for French | p. 356 |
Appendices for English | p. 359 |
12 Virtual Sentences of Spontaneous Speech: Boundary Effects of Syntactic-Semantic-Prosodic Properties | p. 361 |
1 Introduction | p. 361 |
2 Method and Material | p. 364 |
2.1 Subjects | p. 364 |
2.2 Speech Material | p. 364 |
2.3 Procedure | p. 365 |
3 Results | p. 366 |
3.1 Identification of Virtual Sentences in the Normal and Filtered Speech Samples | p. 366 |
3.2 Pauses of the Speech Sample | p. 368 |
3.3 Pause Perception | p. 370 |
3.4 F0 Patterns | p. 372 |
3.5 Comprehension of the Spontaneous Speech Sample | p. 374 |
3.6 The Factor of Gender | p. 375 |
Conclusions | p. 375 |
Acknowledgements | p. 377 |
References | p. 377 |
13 Text-to-Speech Formant Synthesis For French | p. 381 |
1 Introduction | p. 381 |
2 Grapheme-to-Phoneme Conversion | p. 382 |
2.1 Normalization: From Grapheme to Grapheme | p. 382 |
2.2 From Grapheme to Phoneme | p. 384 |
2.3 Exception Dictionary | p. 385 |
3 Prosody | p. 385 |
3.1 Parsing the Text | p. 385 |
3.2 Intonation | p. 386 |
3.3 Phoneme Duration | p. 391 |
4 Acoustics for French Consonants and Vowels | p. 398 |
4.1 Vowels | p. 398 |
4.2 Fricatives (unvoiced:F,S,Ch; voiced: V,Z,J) | p. 400 |
4.3 Plosives (unvoiced:P,T,K; voiced: B,D,G) | p. 401 |
4.4 Nasals (M, N, Gn, Ng) | p. 403 |
4.5 Liquids (L, R) | p. 404 |
4.6 Semivowels (Y, W, Wu) | p. 405 |
4.7 Phoneme Transitions (coarticulation effects) | p. 405 |
4.8 Frame Generation | p. 409 |
4.9 Conclusions for Acoustics | p. 409 |
5 From Acoustics to Speech Signal | p. 410 |
6 Next Generation Formant Synthesis | p. 412 |
7 Singing | p. 414 |
Conclusions | p. 414 |
References | p. 415 |
14 Accessibility and Speech Technology: Advancing Toward Universal Access | p. 417 |
1 Universal Access vs. Assistive Technology | p. 417 |
2 Predicted Enhancements and Improvements to Underlying Technology | p. 419 |
2.1 Social Network Analysis, Blogs, Wikis, and Social Computing | p. 420 |
2.2 Intelligent Agents | p. 421 |
2.3 Learning Objects | p. 422 |
2.4 Cognitive Aids | p. 423 |
2.5 Interface Flexibility and Intelligence | p. 423 |
3 Current Assistive Technology Applications Employing Speech Technology | p. 423 |
3.1 Applications Employing Automatic Speech Recognition (ASR) | p. 424 |
3.2 Applications of Synthetic Speech | p. 428 |
4 Human-Computer Interaction: Design and Evaluation | p. 430 |
5 The Role of Technical Standards in Accessibility | p. 433 |
5.1 Standards Related to Software and Information Technology User Interfaces | p. 434 |
5.2 Speech Application Accessibility Standards | p. 434 |
5.3 Accessibility Data and Accessibility Guidance for General Products | p. 437 |
Conclusions | p. 439 |
References | p. 440 |
15 Synthesized Speech Used for the Evaluation of Children's Hearing and Speech Perception | p. 443 |
1 Introduction | p. 443 |
2 The Background Theory | p. 444 |
3 The Production of the Synthesized Word Material | p. 447 |
4 Pre-Experiments for the Application of Synthesized Words for Hearing Screening | p. 449 |
5 Results | p. 450 |
5.1 Clinical Tests | p. 450 |
5.2 Screening Procedure | p. 453 |
5.3 Evaluation of Acoustic-phonetic Perception | p. 456 |
5.4 Children with Specific Needs | p. 457 |
Conclusions | p. 458 |
Acknowledgements | p. 459 |
References | p. 459 |
Index | p. 461 |