Cover image for Data on the web : from relations to semistructured data and XML
Title:
Data on the web : from relations to semistructured data and XML
Personal Author:
Publication Information:
San Francisco, CA : Morgan Kaufmann Publishers, c2000
ISBN:
9781558606227
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000004517607 QA76.9.D3 A25 2000 Open Access Book Book
Searching...

On Order

Summary

Summary

The Web is causing a revolution in how we represent, retrieve, and process information Its growth has given us a universally accessible database, but in the form of a largely unorganized collection of documents. This is changing, thanks to the simultaneous emergence of new ways of representing data: from within the Web community, XML; and from within the database community, semistructured data. The convergence of these two approaches has rendered them nearly identical. Now, there is a concerted effort to develop effective techniques for retrieving and processing both kinds of data.

Data on the Web is the only comprehensive, up-to-date examination of these rapidly evolving retrieval and processing strategies, which are of critical importance for almost all Web- and data-intensive enterprises. This book offers detailed solutions to a wide range of practical problems while equipping you with a keen understanding of the fundamental issues including data models, query languages, and schemas involved in their design, implementation, and optimization. You'll find it to be compelling reading, whether your interest is that of a practitioner involved in a database-driven Web enterprise or a researcher in computer science or related field.


Author Notes

Serge Abiteboul is Senior Researcher at I.N.R.I.A. and a professor at the Ecole Polytechnique. He received his Ph.D. in computer science from the University of Southern California in 1982 and his These d'Etat from the University of Paris XI in 1986. His recent research has focused on object databases, digital libraries, Semistructured data, data integration, and electronic commerce. Peter Buneman is a professor in the Computer and Information Science Department at the University of Pennsylvania. He earned his undergraduate degree from Cambridge and his Ph.D. from the University of Warwick. His research interests include databases, programming languages, cognitive science, and classification theory. Dan Suciu is a researcher at ATandT Labs who received his Ph.D. from the University of Pennsylvania in 1995. He has devoted his recent research and publications to various aspects of semistructured data, organizing several workshops on the topic, and serving on the committees of ICDT, PODS, and EDBT.


Reviews 1

Library Journal Review

Most data on the web are not well structured, making the search and retrieval process difficult since the spiders, robots, and other search engines don't really understand the context of the data they are indexing and storing. This very advanced book examines the new retrieval and processing techniques as semistructured data and XML (as a data transfer language) that aim to merge a document-based web with a data-driven infrastructure. Hardcore programmers will want this. Recommended for university and large public libraries. (c) Copyright 2010. Library Journals LLC, a wholly owned subsidiary of Media Source, Inc. No redistribution permitted.


Table of Contents

Forewordp. v
Acknowledgmentsp. xiii
1 Introductionp. 1
1.1 Audiencep. 2
1.2 Web Data and the Two Culturesp. 2
1.3 Organizationp. 8
I Data Modelp. 9
2 A Syntax for Datap. 11
2.1 Base Typesp. 13
2.2 Representing Relational Databasesp. 14
2.3 Representing Object Databasesp. 15
2.4 Specification of Syntaxp. 18
2.5 The Object Exchange Model (OEM)p. 19
2.6 Object Databasesp. 19
2.7 Other Representationsp. 22
2.7.1 ACeDBp. 22
2.8 Terminologyp. 24
2.9 Bibliographic Remarksp. 26
3 XMLp. 27
3.1 Basic Syntaxp. 29
3.1.1 XML Elementsp. 29
3.1.2 XML Attributesp. 31
3.1.3 Well-Formed XML Documentsp. 32
3.2 XML and Semistructured Datap. 32
3.2.1 XML Graph Modelp. 33
3.2.2 XML Referencesp. 33
3.2.3 Orderp. 34
3.2.4 Mixing Elements and Textp. 36
3.2.5 Other XML Constructsp. 37
3.3 Document Type Definitionsp. 38
3.3.1 A Simple DTDp. 38
3.3.2 DTDs as Grammarsp. 39
3.3.3 DTDs as Schemasp. 39
3.3.4 Declaring Attributes in DTDsp. 41
3.3.5 Valid XML Documentsp. 44
3.3.6 Limitations of DTDs as Schemasp. 44
3.4 Document Navigationp. 45
3.5 DCDp. 46
3.6 Paraphernaliap. 47
3.6.1 RDFp. 47
3.6.2 Stylesheetsp. 48
3.6.3 SAX and DOMp. 49
3.7 Bibliographic Remarksp. 50
II Queriesp. 51
4 Query Languagesp. 53
4.1 Path Expressionsp. 55
4.2 A Core Languagep. 58
4.2.1 The Basic Syntaxp. 59
4.3 More on Lorelp. 62
4.3.1 Less Essential Syntactic Sugaringp. 64
4.4 UnQLp. 64
4.5 Label and Path Variablesp. 66
4.5.1 Paths as Datap. 68
4.6 Mixing with Structured Datap. 68
4.7 Bibliographic Remarksp. 71
5 Query Languages for XMLp. 73
5.1 XML-QLp. 73
5.1.1 Constructing New XML Datap. 74
5.1.2 Processing Optional Elements with Nested Queriesp. 76
5.1.3 Grouping with Nested Queriesp. 77
5.1.4 Binding Elements and Contentsp. 78
5.1.5 Querying Attributesp. 78
5.1.6 Joining Elements by Valuep. 79
5.1.7 Tag Variablesp. 79
5.1.8 Regular Path Expressionsp. 80
5.1.9 Orderp. 81
5.2 XSLp. 83
5.3 Bibliographic Remarksp. 89
6 Interpretation and Advanced Featuresp. 91
6.1 First-Order Interpretationp. 92
6.2 Object Creationp. 96
6.3 Graphical Languagesp. 100
6.4 Structural Recursionp. 101
6.4.1 Structural Recursion on Treesp. 101
6.4.2 XSL and Structural Recursionp. 104
6.4.3 Bisimulation in Semistructured Datap. 106
6.4.4 Structural Recursion on Cyclic Datap. 111
6.5 StruQLp. 115
6.6 Bibliographic Remarksp. 117
III Typesp. 119
7 Typing Semistructured Datap. 121
7.1 What Is Typing Good For?p. 123
7.1.1 Browsing and Querying Datap. 123
7.1.2 Optimizing Query Evaluationp. 124
7.1.3 Improving Storagep. 125
7.2 Analyzing the Problemp. 126
7.3 Schema Formalismsp. 127
7.3.1 Logicp. 127
7.3.2 Datalogp. 129
7.3.3 Simulationp. 132
7.3.4 Comparison between Datalog Rules and Simulationp. 139
7.4 Extracting Schemas from Datap. 141
7.4.1 Data Guidesp. 141
7.4.2 Extracting Datalog Rules from Datap. 147
7.5 Inferring Schemas from Queriesp. 151
7.6 Sharing, Multiplicity, and Orderp. 154
7.6.1 Sharingp. 154
7.6.2 Attribute Multiplicityp. 155
7.6.3 Orderp. 156
7.7 Path Constraintsp. 157
7.7.1 Constraints in Relational Databasesp. 158
7.7.2 Constraints in Object-Oriented Databasesp. 158
7.7.3 Path Constraints in Semistructured Datap. 160
7.7.4 The Constraint Inference Problemp. 162
7.7.5 Constraints in XMLp. 163
7.8 Bibliographic Remarksp. 164
IV Systemsp. 165
8 Query Processingp. 167
8.1 Architecturep. 167
8.2 Semistructured Data Serversp. 171
8.2.1 Storagep. 171
8.2.2 Indexingp. 179
8.2.3 Distributed Evaluationp. 189
8.3 Mediators for Semistructured Datap. 197
8.3.1 A Simple Mediator: Converting Relational Data to XMLp. 198
8.3.2 Mediators for Data Integrationp. 200
8.4 Incremental Maintenancep. 207
8.5 Bibliographic Remarksp. 209
9 The Lore Systemp. 211
9.1 Architecturep. 212
9.2 Query Processing and Indexesp. 213
9.3 Other Aspects of Lorep. 216
9.3.1 The Data Guidep. 216
9.3.2 Managing External Datap. 217
9.3.3 Proximity Searchp. 217
9.3.4 Viewsp. 217
9.3.5 Dynamic OEM and Chorelp. 218
9.3.6 Mixing Structured and Semistructured in Ozonep. 218
9.4 Bibliographic Remarksp. 219
10 Strudelp. 221
10.1 An Examplep. 222
10.1.1 Data Managementp. 224
10.1.2 Structure Managementp. 227
10.1.3 Management of the Graphical Presentationp. 227
10.2 Advantages of Declarative Web Site Designp. 232
10.3 Bibliographic Remarksp. 233
11 Database Products Supporting XMLp. 235
11.1 Architecturep. 236
11.2 Storagep. 236
11.3 Application Programming Interfacep. 238
11.4 Query languagep. 239
11.5 Scalabilityp. 239
11.6 Bibliographic Remarksp. 239
Bibliographyp. 241
Indexp. 249
About the Authorsp. 258