Introduction to video search engines

The evolution of technology has set the stage for the rapid growth of the video Web: broadband Internet access is ubiquitous, and streaming media protocols, systems, and encoding standards are mature. In addition to Web video delivery, users can easily contribute content captured on low cost camera phones and other consumer products. The media and entertainment industry no longer views these developments as a threat to their established business practices, but as an opportunity to provide services for more viewers in a wider range of consumption contexts. The emergence of IPTV and mobile video services offers unprecedented access to an ever growing number of broadcast channels and provides the flexibility to deliver new, more personalized video services. Highly capable portable media players allow us to take this personalized content with us, and to consume it even in places where the network does not reach. Video search engines enable users to take advantage of these emerging video resources for a wide variety of applications including entertainment, education and communications. However, the task of information extr- tion from video for retrieval applications is challenging, providing opp- tunities for innovation. This book aims to first describe the current state of video search engine technology and second to inform those with the req- site technical skills of the opportunities to contribute to the development of this field. Today's Web search engines have greatly improved the accessibility and therefore the value of the Web.

Author Notes

David Gibbon joined Bell Laboratories in 1985 and is currently a Lead Member of Technical Staff in the Video and Multimedia Services Research Department at AT&T Labs - Research. His research interests include multimedia processing for searching and browsing of video databases and real-time video processing for communications applications. David has written book chapters and encyclopedia articles as well as numerous technical papers; he has 40 US patent filings and holds 14 US patents in the areas of multimedia indexing, streaming, and video analysis; and he is a member of the ACM, and a senior member of the IEEE. David contributes to IPTV industry standards for metadata and in 2007 he was awarded the AT&T Science and Technology Medal for outstanding technical leadership and innovation in the field of Video and Multimedia Processing and Digital Content Management.

Zhu Liu joined AT&T Labs - Research in 2000, and he is currently a Principal Member of Technical Staff in the Video and Multimedia Services Research Department. His research interests include multimedia content processing, multimedia databases, pattern recognition, and machine learning. Zhu holds 7 US patents and he is the inventor of more than 20 pending patents in the areas of multimedia service and content analysis. He has published more than 40 refereed papers in international leading journals and at key conferences in the areas of multimedia. He is a member of ACM and Tau Beta Pi, and a senior member of the IEEE.

Preface	p. v
1 Video Search	p. 1
1.1 Introduction	p. 1
1.2 Addressing the Opportunity	p. 2
1.3 Classification of Web Video Sites	p. 5
1.3.1 Content Originators and Traditional Broadcasters	p. 5
1.3.2 Aggregators	p. 6
1.3.3 Download	p. 6
1.3.4 Sharing	p. 6
1.3.5 Application Specific	p. 7
1.3.6 Other Video Systems	p. 7
1.4 Classification of Video Sources	p. 8
1.4.1 Webcams / Security	p. 9
1.4.2 Video Telephony / Teleconferencing	p. 9
1.4.3 Industrial / Academic / Medical	p. 9
1.4.4 User Generated Content	p. 10
1.4.5 Public Access and Government (PEG) Content	p. 10
1.4.6 Enterprise Content	p. 10
1.4.7 Rushes, Raw Footage	p. 11
1.4.8 News	p. 11
1.4.9 Advertising	p. 11
1.4.10 Episodic TV Programming	p. 11
1.4.11 Feature Films	p. 12
1.4.12 Content Value	p. 12
1.5 Challenges of Video Search	p. 13
1.5.1 Acquisition	p. 14
1.5.2 Media File Formats	p. 15
1.5.3 Data Transport	p. 16
1.5.4 Browsing	p. 16
1.5.5 Duplication	p. 17
1.5.6 Ranking and Indexing	p. 17
1.6 Advantages of Video Search over Text	p. 18
1.6.1 Applications	p. 18
1.6.2 Metadata	p. 19
1.7 Metadata vs. Content	p. 19
1.7.1 Content-based retrieval	p. 19
1.8 Conclusion	p. 20
References	p. 21
2 Video Data Sources and Applications	p. 23
2.1 Introduction	p. 23
2.1.1 Evolution of Digital Media Metadata	p. 23
2.1.2 Consumer Video Metadata	p. 24
2.1.3 Metadata Loss	p. 24
2.1.4 Metadata Standards	p. 25
2.1.5 Dublin Core	p. 26
2.1.6 MPEG-7	p. 27
2.1.7 MPEG-21	p. 27
2.2 Essential Media Metadata	p. 29
2.2.1 Embed Global Metadata	p. 29
2.2.2 Elementary Metadata	p. 29
2.3 Metadata for Personal Media Collections	p. 31
2.3.1 Consumer Media Libraries	p. 31
2.3.2 UPnP Forum	p. 33
2.3.3 MP3 ID3	p. 33
2.3.4 3GP / QuickTime / MP4	p. 34
2.3.5 Metadata Services	p. 34
2.3.6 Content Identification	p. 36
2.3.7 Recorded Television	p. 37
2.4 Media Syndication: RSS Content Description	p. 39
2.4.1 Content Syndication	p. 39
2.4.2 Media Enclosures	p. 39
2.4.3 Podcasts	p. 41
2.4.4 RSS for Content Ingest	p. 42
2.4.5 MediaRSS	p. 43
2.5 Metadata for Broadcast Television	p. 43
2.5.1 Electronic Programming Guide (EPG)	p. 44
2.5.2 Extended Data Service (XDS)	p. 46
2.5.3 Program and System Identifier Protocol (PSIP)	p. 47
2.6 Metadata for Video on Demand	p. 47
2.6.1 Introduction	p. 47
2.6.2 Cable Labs	p. 49
2.7 Production Metadata	p. 50
2.8 Timed Text Formats	p. 51
2.8.1 Introduction	p. 51
2.8.2 Synchronization Precision and Resolution	p. 52
2.8.3 Transcripts	p. 53
2.8.4 Closed Captions	p. 54
2.8.5 Synchronized Accessible Media Interchange	p. 55
2.8.6 Metadata from Social Sources	p. 55
2.8.7 Metadata Issues	p. 55
2.9 Conclusion	p. 56
References	p. 56
3 Internet Video	p. 59
3.1 Introduction	p. 59
3.2 Digital Video	p. 59
3.2.1 Aspect Ratio	p. 59
3.2.2 Luminance and Chrominance Resolution	p. 61
3.2.3 Video Compression	p. 62
3.3 Internet Protocol Media Systems	p. 66
3.3.1 Transport	p. 66
3.3.2 Searching VoD vs. Live	p. 67
3.3.3 IPTV	p. 68
3.3.4 Rights Management	p. 70
3.3.5 Redirector Files	p. 70
3.3.6 Layered Encoding	p. 73
3.3.7 Illustrated Audio	p. 73
3.4 Media Captioning	p. 74
3.5 Conclusion	p. 75
References	p. 76
4 Video Search Engine Systems	p. 77
4.1 Introduction	p. 77
4.2 Content Acquisition	p. 78
4.2.1 Metadata Normalization	p. 78
4.2.2 User Contributed	p. 79
4.2.3 Syndicated Contribution	p. 80
4.2.4 Broadcast Acquisition	p. 81
4.3 Content Processing	p. 82
4.3.1 Asset Management	p. 82
4.4 Retrieval	p. 84
4.5 User Perspectives	p. 85
4.5.1 Interaction States	p. 85
4.5.2 Granularity of Search Results Representation	p. 87
4.6 Factors Concerning Scalability	p. 88
4.6.1 Introduction	p. 88
4.6.2 Acquisition	p. 89
4.6.3 Processing	p. 89
4.6.4 Storage	p. 90
4.6.5 Retrieval	p. 91
4.7 Retrieval Interfaces	p. 92
4.8 Typical System Features	p. 93
4.9 Conclusion	p. 94
References	p. 94
5 Media Processing	p. 97
5.1 Introduction	p. 97
5.2 Feature Extraction	p. 99
5.3 Media Segmentation	p. 100
5.4 Clustering, Structure Generation	p. 101
5.5 Real-Time Processing	p. 103
5.6 Systems Issues and Architectures	p. 103
5.7 Conclusion	p. 104
References	p. 105
6 Video Processing	p. 107
6.1 Introduction	p. 107
6.2 Shot Boundary Determination	p. 108
6.2.1 Feature Extraction	p. 110
6.2.2 Shot Boundary Detectors	p. 111
6.2.3 Fusion of Detector Results	p. 117
6.2.4 Evaluation Results	p. 117
6.3 Representative Image Selection	p. 118
6.4 Face Detection	p. 121
6.5 Face Recognition	p. 126
6.6 Video Optical Character Recognition	p. 129
6.7 Concept Detection	p. 131
6.7.1 Color Feature	p. 133
6.7.2 Texture Feature	p. 133
6.7.3 Edge Feature	p. 135
6.8 Video Browsing	p. 135
6.9 Conclusion	p. 140
References	p. 141
7 Audio Processing	p. 145
7.1 Introduction	p. 145
7.2 Audio Signal and Its Representation	p. 146
7.3 Audio Features	p. 148
7.3.1 Frame-Level Features	p. 148
7.3.2 Clip-Level Features	p. 154
7.4 Audio Segmentation	p. 156
7.4.1 Speaker Segmentation	p. 157
7.4.2 Audio Scene Segmentation	p. 158
7.5 Audio Content Categorization	p. 160
7.5.1 Speaker Recognition	p. 160
7.5.2 Audio Scene Detection	p. 162
7.5.3 Music Genre Classification	p. 163
7.6 Speech Recognition	p. 164
7.7 Audio Query and Browsing Techniques	p. 166
7.7.1 SpeechLogger	p. 167
7.7.2 Query by Example	p. 171
7.8 Conclusion	p. 172
References	p. 173
8 Text Processing	p. 177
8.1 Introduction	p. 177
8.2 Story Segmentation	p. 178
8.2.1 Cue Phrases	p. 178
8.2.2 Cosine Similarity	p. 179
8.2.3 Dynamic Programming	p. 181
8.2.4 Topic Classification	p. 183
8.3 Named Entity Extraction	p. 183
8.3.1 Rule Based NEE	p. 184
8.3.2 Data Driven NEE	p. 185
8.3.3 NEE Tools	p. 186
8.4 Part-of-Speech Tagging	p. 187
8.5 Capitalization	p. 189
8.5.1 Linguistic Processing Architecture	p. 191
8.5.2 Web Document Collection	p. 191
8.5.3 Text Capitalization Algorithm	p. 192
8.6 Information Retrieval	p. 194
8.6.1 Stemming	p. 194
8.6.2 Term Weighting	p. 195
8.6.3 Ranking	p. 196
8.7 Text Summarization	p. 197
8.7.1 Keyword Extraction	p. 199
8.8 Conclusion	p. 201
References	p. 201
9 Multimodal Processing	p. 203
9.1 Introduction	p. 203
9.2 Case Studies	p. 205
9.2.1 Closed Caption Alignment	p. 205
9.2.2 Multimodal News Story Segmentation	p. 209
9.2.3 Major Cast Detection	p. 214
9.3 Conclusion	p. 217
References	p. 217
10 Research Systems	p. 221
10.1 Introduction	p. 221
10.2 Academic and Industrial Research	p. 222
10.3 Early Internet Deployments	p. 226
10.3.1 SpeechBot	p. 226
10.3.2 StreamSage	p. 227
10.3.3 SingingFish	p. 227
10.4 Selected Commercial Systems	p. 228
10.4.1 Virage and Convera	p. 228
10.4.2 Nexidia (FastTalk)	p. 228
10.5 Resources: Datasets, Evaluations, Conferences	p. 229
10.6 Media Monitoring Deployments	p. 231
10.7 Case Study: AT&T MIRACLE	p. 232
10.7.1 Introduction	p. 232
10.7.2 System Architecture	p. 232
10.7.3 Collections	p. 233
10.7.4 Data Organization	p. 235
10.7.5 Acquisition / Ingest	p. 236
10.7.6 Content Processing	p. 238
10.7.7 Real-time processing	p. 239
10.7.8 Query Engine	p. 239
10.7.9 Applications	p. 240
10.7.10 Performance	p. 240
10.8 Conclusion	p. 242
References	p. 242
11 Current Trends in Video Search	p. 247
11.1 Introduction	p. 247
11.2 Video Production	p. 248
11.2.1 Metadata Retention	p. 248
11.2.2 Multiple Distribution Channels	p. 248
11.2.3 Mobisodes and Webisodes	p. 249
11.3 Video Distribution	p. 249
11.3.1 Streaming Protocols	p. 250
11.3.2 Electronic Sell Through	p. 250
11.3.3 Peer-to-peer Delivery	p. 251
11.3.4 Managed Download	p. 251
11.3.5 Syndication	p. 252
11.4 The Video Web and User Interaction	p. 252
11.4.1 Web-Based Editing	p. 252
11.4.2 Media Browsing	p. 252
11.4.3 Social Tagging	p. 253
11.4.4 Dynamic Interfaces	p. 253
11.4.5 Video Blogs (vlogs)	p. 254
11.4.6 Integrated Collections	p. 254
11.5 Television Technology and Consumption	p. 254
11.5.1 Proliferation of Channels	p. 255
11.5.2 Live to Time Shifted	p. 255
11.5.3 Mobile Consumption	p. 255
11.6 Trends in Media Devices	p. 256
11.6.1 Increased Media Capabilities	p. 256
11.6.2 Increasing Accessibility	p. 257
11.6.3 DRM	p. 257
11.6.4 Home Media Systems	p. 257
11.7 Media Processing Research	p. 257
11.8 Deployments	p. 260
11.9 Conclusion	p. 261
References	p. 261
Glossary	p. 265
Index	p. 271

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents