A unified framework for video summarization, browsing and retrieval : with applications to consumer and surveillance video

Large volumes of video content can only be easily accessed by the use of rapid browsing and retrieval techniques. Constructing a video table of contents (ToC) and video highlights to enable end users to sift through all this data and find what they want, when they want are essential. This reference puts forth a unified framework to integrate these functions supporting efficient browsing and retrieval of video content. The authors have developed a cohesive way to create a video table of contents, video highlights, and video indices that serve to streamline the use of applications in consumer and surveillance video applications.

The authors discuss the generation of table of contents, extraction of highlights, different techniques for audio and video marker recognition, and indexing with low-level features such as color, texture, and shape. Current applications including this summarization and browsing technology are also reviewed. Applications such as event detection in elevator surveillance, highlight extraction from sports video, and image and video database management are considered within the proposed framework. This book presents the latest in research and readers will find their search for knowledge completely satisfied by the breadth of the information covered in this volume.

Author Notes

Ziyou Xiong is a senior research engineer/scientist at the Dynamic Modeling and Analysis group of the United Technologies Research Center
Regunathan Radhakrishnan currently is a visiting researcher at Mitsubishi Electric Research Labortories
Ajay Divakaran currently leads the Data and Sensor Systems Team at the Technology Laboratory of Mitsubishi Electric Research Laboratories
Yong Rui is a researcher in the Communication and Collaboration Systems group at Microsoft Research, where he leads the Multimedia Collaboration team
Thomas S. Huang is a William L. Everitt Distinguished Professor of Electrical and Computer Engineering, and head of the Image Formation and Processing Group at the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign

List of Figures	p. xi
List of Tables	p. xvii
Preface	p. xix
Acknowledgments	p. xxi
Chapter 1 Introduction	p. 1
1.1 Introduction	p. 1
1.2 Terminology	p. 3
1.3 Video Analysis	p. 6
1.3.1 Shot Boundary Detection	p. 6
1.3.2 Key Frame Extraction	p. 6
1.3.3 Play/Break Segmentation	p. 7
1.3.4 Audio Marker Detection	p. 7
1.3.5 Video Marker Detection	p. 7
1.4 Video Representation	p. 7
1.4.1 Video Representation for Scripted Content	p. 8
1.4.2 Video Representation for Unscripted Content	p. 9
1.5 Video Browsing and Retrieval	p. 11
1.5.1 Video Browsing Using ToC-Based Summary	p. 11
1.5.2 Video Browsing Using Highlights-Based Summary	p. 11
1.5.3 Video Retrieval	p. 12
1.6 The Rest of the Book	p. 12
Chapter 2 Video Table-of-Content Generation	p. 15
2.1 Introduction	p. 15
2.2 Related Work	p. 17
2.2.1 Shot- and Key Frame-Based Video ToC	p. 17
2.2.2 Group-Based Video ToC	p. 18
2.2.3 Scene-Based Video ToC	p. 19
2.3 The Proposed Approach	p. 20
2.3.1 Shot Boundary Detection and Key Frame Extraction	p. 20
2.3.2 Spatiotemporal Feature Extraction	p. 20
2.3.3 Time-Adaptive Grouping	p. 21
2.3.4 Scene Structure Construction	p. 24
2.4 Determination of the Parameters	p. 30
2.4.1 Gaussian Normalization	p. 30
2.4.2 Determining W[subscript C] and W[subscript A]	p. 31
2.4.3 Determining groupThreshold and sceneThreshold	p. 32
2.5 Experimental Results	p. 33
2.6 Conclusions	p. 37
Chapter 3 Highlights Extraction from Unscripted Video	p. 39
3.1 Introduction	p. 39
3.1.1 Audio Marker Recognition	p. 39
3.1.2 Visual Marker Detection	p. 39
3.1.3 Audio-Visual Marker Association and Finer-Resolution Highlights	p. 41
3.2 Audio Marker Recognition	p. 42
3.2.1 Estimating the Number of Mixtures in GMMs	p. 42
3.2.2 Evaluation Using the Precision-Recall Curve	p. 44
3.2.3 Performance Comparison	p. 46
3.2.4 Experimental Results on Golf Highlights Generation	p. 47
3.3 Visual Marker Detection	p. 52
3.3.1 Motivation	p. 52
3.3.2 Choice of Visual Markers	p. 52
3.3.3 Robust Real-Time Object Detection Algorithm	p. 60
3.3.4 Results of Baseball Catcher Detection	p. 62
3.3.5 Results of Soccer Goalpost Detection	p. 64
3.3.6 Results of Golfer Detection	p. 68
3.4 Finer-Resolution Highlights Extraction	p. 71
3.4.1 Audio-Visual Marker Association	p. 71
3.4.2 Finer-Resolution Highlights Classification	p. 71
3.4.3 Method 1: Clustering	p. 72
3.4.4 Method 2: Color/Motion Modeling Using HMMs	p. 73
3.4.5 Method 3: Audio-Visual Modeling Using CHMMs	p. 82
3.4.6 Experimental Results with DCHMM	p. 85
3.5 Conclusions	p. 96
Chapter 4 Video Structure Discovery Using Unsupervised Learning	p. 97
4.1 Motivation and Related Work	p. 97
4.2 Proposed Inlier/Outlier-Based Representation for "Unscripted" Multimedia Using Audio Analysis	p. 98
4.3 Feature Extraction and the Audio Classification Framework	p. 101
4.3.1 Feature Extraction	p. 102
4.3.2 Mel Frequency Cepstral Coefficients (MFCC)	p. 102
4.3.3 Modified Discrete Cosine Transform (MDCT) Features from AC-3 Stream	p. 103
4.3.4 Audio Classification Framework	p. 109
4.4 Proposed Time Series Analysis Framework	p. 111
4.4.1 Problem Formulation	p. 112
4.4.2 Kernel/Affinity Matrix Computation	p. 113
4.4.3 Segmentation Using Eigenvector Analysis of Affinity Matrices	p. 114
4.4.4 Past Work on Detecting "Surprising" Patterns from Time Series	p. 117
4.4.5 Proposed Outlier Subsequence Detection in Time Series	p. 119
4.4.6 Generative Model for Synthetic Time Series	p. 121
4.4.7 Performance of the Normalized Cut for Case 2	p. 122
4.4.8 Comparison with Other Clustering Approaches for Case 2	p. 127
4.4.9 Performance of Normalized Cut for Case 3	p. 135
4.5 Ranking Outliers for Summarization	p. 141
4.5.1 Kernel Density Estimation	p. 141
4.5.2 Confidence Measure for Outliers with Binomial and Multinomial PDF Models for the Contexts	p. 142
4.5.3 Confidence Measure for Outliers with GMM and HMM Models for the Contexts	p. 149
4.5.4 Using Confidence Measures to Rank Outliers	p. 153
4.6 Application to Consumer Video Browsing	p. 154
4.6.1 Highlights Extraction from Sports Video	p. 154
4.6.2 Scene Segmentation for Situation Comedy Videos	p. 171
4.7 Systematic Acquisition of Key Audio Classes	p. 179
4.7.1 Application to Sports Highlights Extraction	p. 179
4.7.2 Event Detection in Elevator Surveillance Audio	p. 185
4.8 Possibilities for Future Research	p. 192
Chapter 5 Video Indexing	p. 199
5.1 Introduction	p. 199
5.1.1 Motivation	p. 199
5.1.2 Overview of MPEG-7	p. 199
5.2 Indexing with Low-Level Features: Motion	p. 200
5.2.1 Introduction	p. 200
5.2.2 Overview of MPEG-7 Motion Descriptors	p. 201
5.2.3 Camera Motion Descriptor	p. 201
5.2.4 Motion Trajectory	p. 203
5.2.5 Parametric Motion	p. 203
5.2.6 Motion Activity	p. 204
5.2.7 Applications of Motion Descriptors	p. 206
5.2.8 Video Browsing System Based on Motion Activity	p. 208
5.2.9 Conclusion	p. 212
5.3 Indexing with Low-Level Features: Color	p. 212
5.4 Indexing with Low-Level Features: Texture	p. 213
5.5 Indexing with Low-Level Features: Shape	p. 214
5.6 Indexing with Low-Level Features: Audio	p. 215
5.7 Indexing with User Feedback	p. 217
5.8 Indexing Using Concepts	p. 218
5.9 Discussion and Conclusions	p. 219
Chapter 6 A Unified Framework for Video Summarization, Browsing, and Retrieval	p. 221
6.1 Video Browsing	p. 221
6.2 Video Highlights Extraction	p. 223
6.2.1 Audio Marker Detection	p. 223
6.2.2 Visual Marker Detection	p. 224
6.2.3 Audio-Visual Markers Association for Highlights Candidates Generation	p. 225
6.2.4 Finer-Resolution Highlights Recognition and Verification	p. 226
6.3 Video Retrieval	p. 227
6.4 A Unified Framework for Summarization, Browsing, and Retrieval	p. 229
6.5 Conclusions and Promising Research Directions	p. 235
Chapter 7 Applications	p. 237
7.1 Introduction	p. 237
7.2 Consumer Video Applications	p. 238
7.2.1 Challenges for Consumer Video Browsing Applications	p. 241
7.3 Image/Video Database Management	p. 242
7.4 Surveillance	p. 244
7.5 Challenges of Current Applications	p. 247
7.6 Conclusions	p. 247
Chapter 8 Conclusions	p. 249
Bibliography	p. 253
About the Authors	p. 261
Index	p. 265

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents