Skip to:Content
|
Bottom
Cover image for Symbolic data analysis : conceptual statistics and data mining
Title:
Symbolic data analysis : conceptual statistics and data mining
Personal Author:
Series:
Wiley series in computational statistics
Publication Information:
West Sussex : John Wiley & Sons, 2006
ISBN:
9780470090169
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010123737 QA278 B54 2006 Open Access Book Book
Searching...

On Order

Summary

Summary

With the advent of computers, very large datasets have become routine. Standard statistical methods don't have the power or flexibility to analyse these efficiently, and extract the required knowledge. An alternative approach is to summarize a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. One consequence of this is that the data may no longer be formatted as single values, but be represented by lists, intervals, distributions, etc. The summarized data have their own internal structure, which must be taken into account in any analysis.

This text presents a unified account of symbolic data, how they arise, and how they are structured. The reader is introduced to symbolic analytic methods described in the consistent statistical framework required to carry out such a summary and subsequent analysis.

Presents a detailed overview of the methods and applications of symbolic data analysis. Includes numerous real examples, taken from a variety of application areas, ranging from health and social sciences, to economics and computing. Features exercises at the end of each chapter, enabling the reader to develop their understanding of the theory. Provides a supplementary website featuring links to download the SODAS software developed exclusively for symbolic data analysis, data sets, and further material.

Primarily aimed at statisticians and data analysts, Symbolic Data Analysis is also ideal for scientists working on problems involving large volumes of data from a range of disciplines, including computer science, health and the social sciences. There is also much of use to graduate students of statistical data analysis courses.


Author Notes

Edwin Diday is a Professor in Computer Science and Mathematics, at the Universite Paris Dauphine, France.


Table of Contents

1 Introductionp. 1
Referencesp. 6
2 Symbolic Datap. 7
2.1 Symbolic and Classical Datap. 8
2.1.1 Types of datap. 8
2.1.2 Dependencies in the datap. 30
2.2 Categories, Concepts, and Symbolic Objectsp. 34
2.2.1 Preliminariesp. 34
2.2.2 Descriptions, assertions, extentsp. 35
2.2.3 Concepts of conceptsp. 45
2.2.4 Some philosophical aspectsp. 50
2.2.5 Fuzzy, imprecise, and conjunctive datap. 53
2.3 Comparison of Symbolic and Classical Analysesp. 56
Exercisesp. 66
Referencesp. 67
3 Basic Descriptive Statistics: One Variatep. 69
3.1 Some Preliminariesp. 69
3.2 Multi-Valued Variablesp. 73
3.3 Interval-Valued Variablesp. 77
3.4 Modal Multi-Valued Variablesp. 92
3.5 Modal Interval-Valued Variablesp. 96
Exercisesp. 105
Referencesp. 106
4 Descriptive Statistics: Two or More Variatesp. 107
4.1 Multi-Valued Variablesp. 107
4.2 Interval-Valued Variablesp. 109
4.3 Modal Multi-Valued Variablesp. 113
4.4 Modal Interval-Valued Variablesp. 116
4.5 Baseball Interval-Valued Datasetp. 123
4.5.1 The data: actual and virtual datasetsp. 123
4.5.2 Joint histogramsp. 127
4.5.3 Guiding principlesp. 130
4.6 Measures of Dependencep. 131
4.6.1 Moment dependencep. 131
4.6.2 Spearman's rho and copulasp. 138
Exercisesp. 143
Referencesp. 144
5 Principal Component Analysisp. 145
5.1 Vertices Methodp. 145
5.2 Centers Methodp. 172
5.3 Comparison of the Methodsp. 180
Exercisesp. 185
Referencesp. 186
6 Regression Analysisp. 189
6.1 Classical Multiple Regression Modelp. 189
6.2 Multi-Valued Variablesp. 192
6.2.1 Single dependent variablep. 192
6.2.2 Multi-valued dependent variablep. 195
6.3 Interval-Valued Variablesp. 198
6.4 Histogram-Valued Variablesp. 202
6.5 Taxonomy Variablesp. 204
6.6 Hierarchical Variablesp. 214
Exercisesp. 227
Referencesp. 229
7 Cluster Analysisp. 231
7.1 Dissimilarity and Distance Measuresp. 231
7.1.1 Basic definitionsp. 231
7.1.2 Multi-valued variablesp. 237
7.1.3 Interval-valued variablesp. 241
7.1.4 Mixed-valued variablesp. 248
7.2 Clustering Structuresp. 249
7.2.1 Types of clusters: definitionsp. 249
7.2.2 Construction of clusters: building algorithmsp. 256
7.3 Partitionsp. 257
7.4 Hierarchy-Divisive Clusteringp. 259
7.4.1 Some basicsp. 259
7.4.2 Multi-valued variablesp. 262
7.4.3 Interval-valued variablesp. 265
7.5 Hierarchy-Pyramid Clustersp. 285
7.5.1 Some basicsp. 285
7.5.2 Comparison of hierarchy and pyramid structuresp. 290
7.5.3 Construction of pyramidsp. 292
Exercisesp. 305
Referencesp. 306
Data Indexp. 309
Author Indexp. 311
Subject Indexp. 313
Go to:Top of Page