Multimedia semantics : metadata, analysis and interaction

This book explains, collects and reports on the latest research results that aim at narrowing the so-called multimedia "Semantic Gap": the large disparity between descriptions of multimedia content that can be computed automatically, and the richness and subjectivity of semantics in user queries and human interpretations of audiovisual media. Addressing the grand challenge posed by the "Semantic Gap" requires a multi-disciplinary approach (computer science, computer vision and signal processing, cognitive science, web science, etc.) and this is reflected in recent research in this area. In addition, the book targets an interdisciplinary community, and in particular the Multimedia and the Semantic Web communities. Finally, the authors provide both the fundamental knowledge and the latest state-of-the-art results from both communities with the goal of making the knowledge of one community available to the other.

Key Features:

Presents state-of-the art research results in multimedia semantics: multimedia analysis, metadata standards and multimedia knowledge representation, semantic interaction with multimedia Contains real industrial problems exemplified by user case scenarios Offers an insight into various standardisation bodies including W3C, IPTC and ISO MPEG Contains contributions from academic and industrial communities from Europe, USA and Asia Includes an accompanying website containing user cases, datasets, and software mentioned in the book, as well as links to the K-Space NoE and the SMaRT society web sites ( http://www.multimediasemantics.com/ )

This book will be a valuable reference for academic and industry researchers /practitioners in multimedia, computational intelligence and computer science fields. Graduate students, project leaders, and consultants will also find this book of interest.

Author Notes

Dr. Raphaël Troncy, Centre for Mathematics and Computer Science, Netherlands
Raphaël Troncy obtained his Master's thesis with honours in computer science at the University Joseph Fourier of Grenoble, France. He received his PhD with honours in 2004. His research interests include Semantic Web and Multimedia Technologies, Knowledge Representation, Ontology Modeling and Alignment. Raphaël Troncy is an expert in audio visual metadata and in combining existing metadata standards (such as MPEG-7) with current Semantic Web technologies.

Dr. Benoit Huet, Institut EURECOM, France
Benoit Huet received his BSc degree in computer science and engineering from the Ecole Superieure de Technologie Electrique (Groupe ESIEE, France) in 1992. In 1993, he was awarded the MSc degree in Artificial Intelligence from the University of Westminster (UK) with distinction. He received his PhD degree in Computer Science from the University of York (UK). His research interests include computer vision, content-based retrieval, multimedia data mining and indexing (still and/or moving images) and pattern recognition.

Simon Schenk, University of Koblenz-Landau, Germany
Simon Schenk is a research and teaching assistant at the Information Systems and Semantic Web Group of University of Koblenz-Landau.Simon is working towards his PhD degree under the supervision of Professor Dr. Steffen Staab. Previously, he has worked as a consultant for Capgemini. Schenk studied at NORDAKADEMIE University of Applied Sciences, Germany and Karlstads Universitet, Sweden and received his diploma in Computer Science and Business Management from NORDAKADEMIE in 2004.

Raphaël Troncy and Benoit Huet and Simon SchenkWerner Bailer and Susanne Boll and Oscar Celma and Michael Hausenblas and Yves RaimondLynda Hardman and Zeljko Obrenovic and Frank NackRachid Benmokhtar and Benoit Huet and Gaël Richard and Slim EssidSlim Essid and Marine Campedel and Gaël Richard and Tomas Piatrik and Rachid Benmokhtar and Benoit HuetEyal Oren and Simon SchenkAntoine Isaac and Simon Schenk and Ansgar ScherpPeter Schallauer and Werner Bailer and Raphael Troncy and Florian KaiserThomas Franz and Raphaël Troncy and Miroslav VacuraThanos Athanasiadis and Phivos Mylonas and Georgios Th. Papadopoulos and Vasileios Mezaris and Yannis Avrithis and Ioannis Kompatsiaris and Michael G. StrintzisNikolaos Simon and Giorgos Stoilos and Carsten Saathoff and Jan Nemrava and Vojtech Svdtek and Petr Berka and Vassilis TzouvarasNoel E. O'Connor and David A. Sadlier and Bart Lehane and Andrew Salway and Jan Nemrava and Paul BuitelaarCarsten Saathoff and Krishna Chandramouli and Werner Bailer and Peter Schallauer and Raphaël TroncyFrank Hopfgartner and Reede Ren and Thierry Urruty and Joemon M. JoseMichiel Hildebrand and Jacco van Ossenbruggen and Lynda HardmanRaphaël Troncy and Benoit Huet and Simon Schenk

Foreword	p. xi
List of Figures	p. xiii
List of Tables	p. xvii
List of Contributors	p. xix
1 Introduction	p. 1
2 Use Case Scenarios	p. 7
2.1 Photo Use Case	p. 8
2.1.1 Motivating Examples	p. 8
2.1.2 Semantic Description of Photos Today	p. 9
2.1.3 Services We Need for Photo Collections	p. 10
2.2 Music Use Case	p. 10
2.2.1 Semantic Description of Music Assets	p. 11
2.2.2 Music Recommendation and Discovery	p. 12
2.2.3 Management of Personal Music Collections	p. 13
2.3 Annotation in Professional Media Production and Archiving	p. 14
2.3.1 Motivating Examples	p. 15
2.3.2 Requirements for Content Annotation	p. 17
2.4 Discussion	p. 18
Acknowledgements	p. 19
3 Canonical Processes of Semantically Annotated Media Production	p. 21
3.1 Canonical Processes	p. 22
3.1.1 Premeditate	p. 23
3.1.2 Create Media Asset	p. 23
3.1.3 Annotate	p. 23
3.1.4 Package	p. 24
3.1.5 Query	p. 24
3.1.6 Construct Message	p. 25
3.1.7 Organize	p. 25
3.1.8 Publish	p. 26
3.1.9 Distribute	p. 26
3.2 Example Systems	p. 27
3.2.1 CeWe Color Photo Book	p. 27
3.2.2 SenseCam	p. 29
3.3 Conclusion and Future Work	p. 33
4 Feature Extraction for Multimedia Analysis	p. 35
4.1 Low-Level Feature Extraction	p. 36
4.1.1 What Are Relevant Low-Level Features?	p. 36
4.1.2 Visual Descriptors	p. 36
4.1.3 Audio Descriptors	p. 45
4.2 Feature Fusion and Multi-modality	p. 54
4.2.1 Feature Normalization	p. 54
4.2.2 Homogeneous Fusion	p. 55
4.2.3 Cross-modal Fusion	p. 56
4.3 Conclusion	p. 58
5 Machine Learning Techniques for Multimedia Analysis	p. 59
5.1 Feature Selection	p. 61
5.1.1 Selection Criteria	p. 61
5.1.2 Subset Search	p. 62
5.1.3 Feature Ranking	p. 63
5.1.4 A Supervised Algorithm Example	p. 63
5.2 Classification	p. 65
5.2.1 Historical Classification Algorithms	p. 65
5.2.2 Kernel Methods	p. 67
5.2.3 Classifying Sequences	p. 71
5.2.4 Biologically Inspired Machine Learning Techniques	p. 73
5.3 Classifier Fusion	p. 75
5.3.1 Introduction	p. 75
5.3.2 Non-trainable Combiners	p. 75
5.3.3 Trainable Combiners	p. 76
5.3.4 Combination of Weak Classifiers	p. 77
5.3.5 Evidence Theory	p. 78
5.3.6 Consensual Clustering	p. 78
5.3.7 Classifier Fusion Properties	p. 80
5.4 Conclusion	p. 80
6 Semantic Web Basics	p. 81
6.1 The Semantic Web	p. 82
6.2 RDF	p. 83
6.2.1 RDF Graphs	p. 86
6.2.2 Named Graphs	p. 87
6.2.3 RDF Semantics	p. 88
6.3 RDF Schema	p. 90
6.4 Data Models	p. 93
6.5 Linked Data Principles	p. 94
6.5.1 Dereferencing Using Basic Web Look-up	p. 95
6.5.2 Dereferencing Using HTTP 303 Redirects	p. 95
6.6 Development Practicalities	p. 96
6.6.1 Data Stores	p. 97
6.6.2 Toolkits	p. 97
7 Semantic Web Languages	p. 99
7.1 The Need for Ontologies on the Semantic Web	p. 100
7.2 Representing Ontological Knowledge Using OWL	p. 100
7.2.1 OWL Constructs and OWL Syntax	p. 100
7.2.2 The Formal Semantics of OWL and its Different Layers	p. 102
7.2.3 Reasoning Tasks	p. 106
7.2.4 OWL Flavors	p. 107
7.2.5 Beyond OWL	p. 107
7.3 A Language to Represent Simple Conceptual Vocabularies: SKOS	p. 108
7.3.1 Ontologies versus Knowledge Organization Systems	p. 108
7.3.2 Representing Concept Schemes Using SKOS	p. 109
7.3.3 Characterizing Concepts beyond SKOS	p. 111
7.3.4 Using SKOS Concept Schemes on the Semantic Web	p. 112
7.4 Querying on the Semantic Web	p. 113
7.4.1 Syntax	p. 113
7.4.2 Semantics	p. 118
7.4.3 Default Negation in SPARQL	p. 123
7.4.4 Well-Formed Queries	p. 124
7.4.5 Querying for Multimedia Metadata	p. 124
7.4.6 Partitioning Datasets	p. 126
7.4.7 Related Work	p. 127
8 Multimedia Metadata Standards	p. 129
8.1 Selected Standards	p. 130
8.1.1 MPEG-7	p. 130
8.1.2 EBU P_Meta	p. 132
8.1.3 SMPTE Metadata Standards	p. 133
8.1.4 Dublin Core	p. 133
8.1.5 TV-Anytime	p. 134
8.1.6 METS and VRA	p. 134
8.1.7 MPEG-21	p. 135
8.1.8 XMP, IPTC in XMP	p. 135
8.1.9 EXIF	p. 136
8.1.10 DIG35	p. 137
8.1.11 ID3/MP3	p. 137
8.1.12 NewsML G2 and rNews	p. 138
8.1.13 W3C Ontology for Media Resources	p. 138
8.1.14 EBUCore	p. 139
8.2 Comparison	p. 140
8.3 Conclusion	p. 143
9 The Core Ontology for Multimedia	p. 145
9.1 Introduction	p. 145
9.2 A Multimedia Presentation for Granddad	p. 146
9.3 Related Work	p. 149
9.4 Requirements for Designing a Multimedia Ontology	p. 150
9.5 A Formal Representation for MPEG-7	p. 150
9.5.1 DOLCE as Modeling Basis	p. 151
9.5.2 Multimedia Patterns	p. 151
9.5.3 Basic Patterns	p. 155
9.5.4 Comparison with Requirements	p. 157
9.6 Granddad's Presentation Explained by COMM	p. 157
9.7 Lessons Learned	p. 159
9.8 Conclusion	p. 160
10 Knowledge-Driven Segmentation and Classification	p. 163
10.1 Related Work	p. 164
10.2 Semantic Image Segmentation	p. 165
10.2.1 Graph Representation of an Image	p. 165
10.2.2 Image Graph Initialization	p. 165
10.2.3 Semantic Region Growing	p. 167
10.3 Using Contextual Knowledge to Aid Visual Analysis	p. 170
10.3.1 Contextual Knowledge Formulation	p. 170
10.3.2 Contextual Relevance	p. 173
10.4 Spatial Context and Optimization	p. 177
10.4.1 Introduction	p. 177
10.4.2 Low-Level Visual Information Processing	p. 177
10.4.3 Initial Region-Concept Association	p. 178
10.4.4 Final Region-Concept Association	p. 179
10.5 Conclusions	p. 181
11 Reasoning for Multimedia Analysis	p. 183
11.1 Fuzzy DL Reasoning	p. 184
11.1.1 The Fuzzy DLf-SKLM	p. 184
11.1.2 The Tableaux Algorithm	p. 185
11.1.3 The FiRE Fuzzy Reasoning Engine	p. 187
11.2 Spatial Features for Image Region Labeling	p. 192
11.2.1 Fuzzy Constraint Satisfaction Problems	p. 192
11.2.2 Exploiting Spatial Features Using Fuzzy Constraint Reasoning	p. 193
11.3 Fuzzy Rule Based Reasoning Engine	p. 196
11.4 Reasoning over Resources Complementary to Audiovisual Streams	p. 201
12 Multi-Modal Analysis for Content Structuring and Event Detection	p. 205
12.1 Moving Beyond Shots for Extracting Semantics	p. 206
12.2 A Multi-Modal Approach	p. 207
12.3 Case Studies	p. 207
12.4 Case Study 1: Field Sports	p. 208
12.4.1 Content Structuring	p. 208
12.4.2 Concept Detection Leveraging Complementary Text Sources	p. 213
12.5 Case Study 2: Fictional Content	p. 214
12.5.1 Content Structuring	p. 215
12.5.2 Concept Detection Leveraging Audio Description	p. 219
12.6 Conclusions and Future Work	p. 221
13 Multimedia Annotation Tools	p. 223
13.1 State of the Art	p. 224
13.2 SVAT: Professional Video Annotation	p. 225
13.2.1 User Interface	p. 225
13.2.2 Semantic Annotation	p. 228
13.3 KAT: Semi-automatic, Semantic Annotation of Multimedia Content	p. 229
13.3.1 History	p. 231
13.3.2 Architecture	p. 232
13.3.3 Default Plugins	p. 234
13.3.4 Using COMM as an Underlying Model: Issues and Solutions	p. 234
13.3.5 Semi-automatic Annotation: An Example	p. 237
13.4 Conclusions	p. 239
14 Information Organization Issues in Multimedia Retrieval Using Low-Level Features	p. 241
14.1 Efficient Multimedia Indexing Structures	p. 242
14.1.1 An Efficient Access Structure for Multimedia Data	p. 243
14.1.2 Experimental Results	p. 245
14.1.3 Conclusion	p. 249
14.2 Feature Term Based Index	p. 249
14.2.1 Feature Terms	p. 250
14.2.2 Feature Term Distribution	p. 251
14.2.3 Feature Term Extraction	p. 252
14.2.4 Feature Dimension Selection	p. 253
14.2.5 Collection Representation and Retrieval System	p. 254
14.2.6 Experiment	p. 256
14.2.7 Conclusion	p. 258
14.3 Conclusion and Future Trends	p. 259
Acknowledgement	p. 259
15 The Role of Explicit Semantics in Search and Browsing	p. 261
15.1 Basic Search Terminology	p. 261
15.2 Analysis of Semantic Search	p. 262
15.2.1 Query Construction	p. 263
15.2.2 Search Algorithm	p. 265
15.2.3 Presentation of Results	p. 267
15.2.4 Survey Summary	p. 269
15.3 Use Case A: Keyword Search in ClioPatria	p. 270
15.3.1 Query Construction	p. 270
15.3.2 Search Algorithm	p. 270
15.3.3 Result Visualization and Organization	p. 273
15.4 Use Case B: Faceted Browsing in ClioPatria	p. 274
15.4.1 Query Construction	p. 274
15.4.2 Search Algorithm	p. 276
15.4.3 Result Visualization and Organization	p. 276
15.5 Conclusions	p. 277
16 Conclusion	p. 279
References	p. 281
Author Index	p. 301
Subject Index	p. 303

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents