Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010163964 | QA76.9.D37 M34 2008 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
A data warehouse stores large volumes of historical data required for analytical purposes. This data is extracted from operational databases; transformed into a coherent whole using a multidimensional model that includes measures, dimensions, and hierarchies; and loaded into a data warehouse during the extraction-transformation-loading (ETL) process.
Malinowski and Zimányi explain in detail conventional data warehouse design, covering in particular complex hierarchy modeling. Additionally, they address two innovative domains recently introduced to extend the capabilities of data warehouse systems, namely the management of spatial and temporal information. Their presentation covers different phases of the design process, such as requirements specification, conceptual, logical, and physical design. They include three different approaches for requirements specification depending on whether users, operational data sources, or both are the driving force in the requirements gathering process, and they show how each approach leads to the creation of a conceptual multidimensional model. Throughout the book the concepts are illustrated using many real-world examples and completed by sample implementations for Microsoft's Analysis Services 2005 and Oracle 10g with the OLAP and the Spatial extensions.
For researchers this book serves as an introduction to the state of the art on data warehouse design, with many references to more detailed sources. Providing a clear and a concise presentation of the major concepts and results of data warehouse design, it can also be used as the basis of a graduate or advanced undergraduate course. The book may help experienced data warehouse designers to enlarge their analysis possibilities by incorporating spatial and temporal information. Finally, experts in spatial databases or in geographical information systems could benefit from the data warehouse vision for building innovative spatial analytical applications.
Author Notes
Elzbieta Malinowski is a professor at the department of Computer and Information
Science at the Universidad de Costa Rica and a professional consultant in
Costa Rica in the area of the Data Warehousing. She received her master degrees
from Saint Petersburg Electrotechnical University, Russia (1982) and
University of Florida, USA (1996), and her Ph.D. degree from
Université Libre de Bruxelles, Belgium (2006). Her research interests
include data warehouses, OLAP systems, geographic information systems,
and temporal databases.
Esteban Zimányi is a professor of computer science at the Engineering Department of the Université Libre de Bruxelles (ULB), Belgium. He received the BSc degree (1988) and the doctorate degree (1992) in computer science from the Sciences Department at the ULB. His current research interests include conceptual modeling, geographic information systems, spatio-temporal databases, and semantic web.
Table of Contents
1 Introduction | p. 1 |
1.1 Overview | p. 2 |
1.1.1 Conventional Data Warehouses | p. 2 |
1.1.2 Spatial Databases and Spatial Data Warehouses | p. 4 |
1.1.3 Temporal Databases and Temporal Data Warehouses | p. 5 |
1.1.4 Conceptual Modeling for Databases and Data Warehouses | p. 6 |
1.1.5 A Method for Data Warehouse Design | p. 7 |
1.2 Motivation for the Book | p. 8 |
1.3 Objective of the Book and its Contributions to Research | p. 11 |
1.3.1 Conventional Data Warehouses | p. 12 |
1.3.2 Spatial Data Warehouses | p. 13 |
1.3.3 Temporal Data Warehouses | p. 13 |
1.4 Organization of the Book | p. 14 |
2 Introduction to Databases and Data Warehouses | p. 17 |
2.1 Database Concepts | p. 18 |
2.2 The Entity-Relationship Model | p. 19 |
2.3 Logical Database Design | p. 23 |
2.3.1 The Relational Model | p. 23 |
2.3.2 The Object-Relational Model | p. 32 |
2.4 Physical Database Design | p. 38 |
2.5 Data Warehouses | p. 41 |
2.6 The Multidimensional Model | p. 43 |
2.6.1 Hierarchies | p. 44 |
2.6.2 Measure Aggregation | p. 45 |
2.6.3 OLAP Operations | p. 47 |
2.7 Logical Data Warehouse Design | p. 49 |
2.8 Physical Data Warehouse Design | p. 51 |
2.9 Data Warehouse Architecture | p. 55 |
2.9.1 Back-End Tier | p. 56 |
2.9.2 Data Warehouse Tier | p. 57 |
2.9.3 OLAP Tier | p. 58 |
2.9.4 Front-End Tier | p. 58 |
2.9.5 Variations of the Architecture | p. 59 |
2.10 Analysis Services 2005 | p. 59 |
2.10.1 Defining an Analysis Services Database | p. 60 |
2.10.2 Data Sources | p. 61 |
2.10.3 Data Source Views | p. 61 |
2.10.4 Dimensions | p. 62 |
2.10.5 Cubes | p. 64 |
2.11 Oracle 10g with the OLAP Option | p. 66 |
2.11.1 Multidimensional Model | p. 67 |
2.11.2 Multidimensional Database Design | p. 68 |
2.11.3 Data Source Management | p. 69 |
2.11.4 Dimensions | p. 70 |
2.11.5 Cubes | p. 71 |
2.12 Conclusion | p. 73 |
3 Conventional Data Warehouses | p. 75 |
3.1 MultiDim: A Conceptual Multidimensional Model | p. 76 |
3.2 Data Warehouse Hierarchies | p. 79 |
3.2.1 Simple Hierarchies | p. 81 |
3.2.2 Nonstrict Hierarchies | p. 88 |
3.2.3 Alternative Hierarchies | p. 93 |
3.2.4 Parallel Hierarchies | p. 94 |
3.3 Advanced Modeling Aspects | p. 97 |
3.3.1 Modeling of Complex Hierarchies | p. 97 |
3.3.2 Role-Playing Dimensions | p. 100 |
3.3.3 Fact Dimensions | p. 101 |
3.3.4 Multivalued Dimensions | p. 101 |
3.4 Metamodel of the MultiDim Model | p. 106 |
3.5 Mapping to the Relational and Object-Relational Models | p. 107 |
3.5.1 Rationale | p. 107 |
3.5.2 Mapping Rules | p. 108 |
3.6 Logical Representation of Hierarchies | p. 112 |
3.6.1 Simple Hierarchies | p. 112 |
3.6.2 Nonstrict Hierarchies | p. 120 |
3.6.3 Alternative Hierarchies | p. 123 |
3.6.4 Parallel Hierarchies | p. 123 |
3.7 Implementing Hierarchies | p. 124 |
3.7.1 Hierarchies in Analysis Services 2005 | p. 124 |
3.7.2 Hierarchies in Oracle OLAP 10g | p. 126 |
3.8 Related Work | p. 128 |
3.9 Summary | p. 130 |
4 Spatial Data Warehouses | p. 133 |
4.1 Spatial Databases: General Concepts | p. 134 |
4.1.1 Spatial Objects | p. 134 |
4.1.2 Spatial Data Types | p. 134 |
4.1.3 Reference Systems | p. 136 |
4.1.4 Topological Relationships | p. 136 |
4.1.5 Conceptual Models for Spatial Data | p. 138 |
4.1.6 Implementation Models for Spatial Data | p. 138 |
4.1.7 Models for Storing Collections of Spatial Objects | p. 139 |
4.1.8 Architecture of Spatial Systems | p. 140 |
4.2 Spatial Extension of the MultiDim Model | p. 141 |
4.3 Spatial Levels | p. 143 |
4.4 Spatial Hierarchies | p. 143 |
4.4.1 Hierarchy Classification | p. 143 |
4.4.2 Topological Relationships Between Spatial Levels | p. 149 |
4.5 Spatial Fact Relationships | p. 152 |
4.6 Spatiality and Measures | p. 153 |
4.6.1 Spatial Measures | p. 153 |
4.6.2 Conventional Measures Resulting from Spatial Operations | p. 156 |
4.7 Metamodel of the Spatially Extended MultiDim Model | p. 157 |
4.8 Rationale of the Logical-Level Representation | p. 159 |
4.8.1 Using the Object-Relational Model | p. 159 |
4.8.2 Using Spatial Extensions of DBMSs | p. 160 |
4.8.3 Preserving Semantics | p. 161 |
4.9 Object-Relational Representation of Spatial Data Warehouses | p. 162 |
4.9.1 Spatial Levels | p. 162 |
4.9.2 Spatial Attributes | p. 164 |
4.9.3 Spatial Hierarchies | p. 165 |
4.9.4 Spatial Fact Relationships | p. 170 |
4.9.5 Measures | p. 172 |
4.10 Summary of the Mapping Rules | p. 174 |
4.11 Related Work | p. 175 |
4.12 Summary | p. 178 |
5 Temporal Data Warehouses | p. 181 |
5.1 Slowly Changing Dimensions | p. 182 |
5.2 Temporal Databases: General Concepts | p. 185 |
5.2.1 Temporality Types | p. 185 |
5.2.2 Temporal Data Types | p. 186 |
5.2.3 Synchronization Relationships | p. 187 |
5.2.4 Conceptual and Logical Models for Temporal Databases | p. 189 |
5.3 Temporal Extension of the MultiDim Model | p. 190 |
5.3.1 Temporality Types | p. 190 |
5.3.2 Overview of the Model | p. 192 |
5.4 Temporal Support for Levels | p. 195 |
5.5 Temporal Hierarchies | p. 196 |
5.5.1 Nontemporal Relationships Between Temporal Levels | p. 196 |
5.5.2 Temporal Relationships Between Nontemporal Levels | p. 198 |
5.5.3 Temporal Relationships Between Temporal Levels | p. 198 |
5.5.4 Instant and Lifespan Cardinalities | p. 199 |
5.6 Temporal Fact Relationships | p. 201 |
5.7 Temporal Measures | p. 202 |
5.7.1 Temporal Support for Measures | p. 202 |
5.7.2 Measure Aggregation for Temporal Relationships | p. 207 |
5.8 Managing Different Temporal Granularities | p. 207 |
5.8.1 Conversion Between Granularities | p. 208 |
5.8.2 Different Granularities in Measures and Dimensions | p. 208 |
5.8.3 Different Granularities in the Source Systems and in the Data Warehouse | p. 210 |
5.9 Metamodel of the Temporally Extended MultiDim Model | p. 211 |
5.10 Rationale of the Logical-Level Representation | p. 213 |
5.11 Logical Representation of Temporal Data Warehouses | p. 214 |
5.11.1 Temporality Types | p. 214 |
5.11.2 Levels with Temporal Support | p. 216 |
5.11.3 Parent-Child Relationships | p. 220 |
5.11.4 Fact Relationships and Temporal Measures | p. 226 |
5.12 Summary of the Mapping Rules | p. 228 |
5.13 Implementation Considerations | p. 229 |
5.13.1 Integrity Constraints | p. 229 |
5.13.2 Measure Aggregation | p. 234 |
5.14 Related Work | p. 237 |
5.14.1 Types of Temporal Support | p. 237 |
5.14.2 Conceptual Models for Temporal Data Warehouses | p. 238 |
5.14.3 Logical Representation | p. 240 |
5.14.4 Temporal Granularity | p. 241 |
5.15 Summary | p. 242 |
6 Designing Conventional Data Warehouses | p. 245 |
6.1 Current Approaches to Data Warehouse Design | p. 246 |
6.1.1 Data Mart and Data Warehouse Design | p. 246 |
6.1.2 Design Phases | p. 248 |
6.1.3 Requirements Specification for Data Warehouse Design | p. 248 |
6.2 A Method for Data Warehouse Design | p. 250 |
6.3 A University Case Study | p. 251 |
6.4 Requirements Specification | p. 253 |
6.4.1 Analysis-Driven Approach | p. 253 |
6.4.2 Source-Driven Approach | p. 261 |
6.4.3 Analysis/Source-Driven Approach | p. 265 |
6.5 Conceptual Design | p. 265 |
6.5.1 Analysis-Driven Approach | p. 266 |
6.5.2 Source-Driven Approach | p. 275 |
6.5.3 Analysis/Source-Driven Approach | p. 278 |
6.6 Characterization of the Various Approaches | p. 280 |
6.6.1 Analysis-Driven Approach | p. 280 |
6.6.2 Source-Driven Approach | p. 282 |
6.6.3 Analysis/Source-Driven Approach | p. 283 |
6.7 Logical Design | p. 283 |
6.7.1 Logical Representation of Data Warehouse Schemas | p. 283 |
6.7.2 Defining ETL Processes | p. 287 |
6.8 Physical Design | p. 288 |
6.8.1 Data Warehouse Schema Implementation | p. 288 |
6.8.2 Implementation of ETL Processes | p. 294 |
6.9 Method Summary | p. 295 |
6.9.1 Analysis-Driven Approach | p. 296 |
6.9.2 Source-Driven Approach | p. 296 |
6.9.3 Analysis/Source-Driven Approach | p. 297 |
6.10 Related Work | p. 298 |
6.10.1 Overall Methods | p. 300 |
6.10.2 Requirements Specification | p. 301 |
6.11 Summary | p. 305 |
7 Designing Spatial and Temporal Data Warehouses | p. 307 |
7.1 Current Approaches to the Design of Spatial and Temporal Databases | p. 308 |
7.2 A Risk Management Case Study | p. 308 |
7.3 A Method for Spatial-Data-Warehouse Design | p. 310 |
7.3.1 Requirements Specification and Conceptual Design | p. 310 |
7.3.2 Logical and Physical Design | p. 321 |
7.4 Revisiting the University Case Study | p. 324 |
7.5 A Method for Temporal-Data-Warehouse Design | p. 325 |
7.5.1 Requirements Specification and Conceptual Design | p. 326 |
7.5.2 Logical and Physical Design | p. 333 |
7.6 Method Summary | p. 337 |
7.6.1 Analysis-Driven Approach | p. 337 |
7.6.2 Source-Driven Approach | p. 338 |
7.6.3 Analysis/Source-Driven Approach | p. 339 |
7.7 Related Work | p. 340 |
7.8 Summary | p. 342 |
8 Conclusions and Future Work | p. 345 |
8.1 Conclusions | p. 345 |
8.2 Future Work | p. 348 |
8.2.1 Conventional Data Warehouses | p. 348 |
8.2.2 Spatial Data Warehouses | p. 349 |
8.2.3 Temporal Data Warehouses | p. 351 |
8.2.4 Spatiotemporal Data Warehouses | p. 352 |
8.2.5 Design Methods | p. 353 |
A Formalization of the MultiDim Model | p. 355 |
A.1 Notation | p. 355 |
A.2 Predefined Data Types | p. 355 |
A.3 Metavariables | p. 356 |
A.4 Abstract Syntax | p. 357 |
A.5 Examples Using the Abstract Syntax | p. 359 |
A.5.1 Conventional Data Warehouse | p. 359 |
A.5.2 Spatial Data Warehouse | p. 361 |
A.5.3 Temporal Data Warehouse | p. 364 |
A.6 Semantics | p. 366 |
A.6.1 Semantics of the Predefined Data Types | p. 367 |
A.6.2 The Space Model | p. 367 |
A.6.3 The Time Model | p. 371 |
A.6.4 Semantic Domains | p. 372 |
A.6.5 Auxiliary Functions | p. 372 |
A.6.6 Semantic Functions | p. 375 |
B Graphical Notation | p. 383 |
B.1 Entity-Relationship Model | p. 383 |
B.2 Relational and Object-Relational Models | p. 385 |
B.3 Conventional Data Warehouses | p. 386 |
B.4 Spatial Data Warehouses | p. 388 |
B.5 Temporal Data Warehouses | p. 389 |
References | p. 391 |
Glossary | p. 411 |
Index | p. 425 |