Cover image for Correspondence analysis and data coding with java and R
Title:
Correspondence analysis and data coding with java and R
Personal Author:
Series:
Computer science and data analysis series
Publication Information:
Boca Raton, FL : Chapman & Hall/CRC, 2005
ISBN:
9781584885283

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000004734905 QA76.73.J38 M87 2005 Open Access Book Book
Searching...

On Order

Summary

Summary

Developed by Jean-Paul Benzérci more than 30 years ago, correspondence analysis as a framework for analyzing data quickly found widespread popularity in Europe. The topicality and importance of correspondence analysis continue, and with the tremendous computing power now available and new fields of application emerging, its significance is greater than ever.

Correspondence Analysis and Data Coding with Java and R clearly demonstrates why this technique remains important and in the eyes of many, unsurpassed as an analysis framework. After presenting some historical background, the author presents a theoretical overview of the mathematics and underlying algorithms of correspondence analysis and hierarchical clustering. The focus then shifts to data coding, with a survey of the widely varied possibilities correspondence analysis offers and introduction of the Java software for correspondence analysis, clustering, and interpretation tools. A chapter of case studies follows, wherein the author explores applications to areas such as shape analysis and time-evolving data. The final chapter reviews the wealth of studies on textual content as well as textual form, carried out by Benzécri and his research lab. These discussions show the importance of correspondence analysis to artificial intelligence as well as to stylometry and other fields.

This book not only shows why correspondence analysis is important, but with a clear presentation replete with advice and guidance, also shows how to put this technique into practice. Downloadable software and data sets allow quick, hands-on exploration of innovative correspondence analysis applications.


Author Notes

Murtagh, Fionn


Table of Contents

1 Introductionp. 1
1.1 Data Analysisp. 1
1.2 Notes on the History of Data Analysisp. 3
1.2.1 Biometryp. 4
1.2.2 Era Piscatoriap. 4
1.2.3 Psychometricsp. 5
1.2.4 Analysis of Proximitiesp. 7
1.2.5 Genesis of Correspondence Analysisp. 8
1.3 Correspondence Analysis or Principal Components Analysisp. 9
1.3.1 Similarities of These Two Algorithmsp. 9
1.3.2 Introduction to Principal Components Analysisp. 10
1.3.3 An Illustrative Examplep. 11
1.3.4 Principal Components Analysis of Globular Clustersp. 13
1.3.5 Correspondence Analysis of Globular Clustersp. 14
1.4 R Software for Correspondence Analysis and Clusteringp. 17
1.4.1 Fuzzy or Piecewise Linear Codingp. 17
1.4.2 Utility for Plotting Axesp. 18
1.4.3 Correspondence Analysis Programp. 18
1.4.4 Running the Analysis and Displaying Resultsp. 20
1.4.5 Hierarchical Clusteringp. 21
1.4.6 Handling Large Data Setsp. 27
2 Theory of Correspondence Analysisp. 29
2.1 Vectors and Projectionsp. 29
2.2 Factorsp. 32
2.2.1 Review of Metric Spacesp. 32
2.2.2 Clouds of Points, Masses, and Inertiap. 34
2.2.3 Notation for Factorsp. 35
2.2.4 Properties of Factorsp. 36
2.2.5 Properties of Factors: Tensor Notationp. 36
2.3 Transformp. 38
2.3.1 Forward Transformp. 38
2.3.2 Inverse Transformp. 38
2.3.3 Decomposition of Inertiap. 38
2.3.4 Relative and Absolute Contributionsp. 39
2.3.5 Reduction of Dimensionalityp. 39
2.3.6 Interpretation of Resultsp. 39
2.3.7 Analysis of the Dual Spacesp. 40
2.3.8 Supplementary Elementsp. 41
2.4 Algebraic Perspectivep. 41
2.4.1 Processingp. 41
2.4.2 Motivationp. 41
2.4.3 Operationsp. 42
2.4.4 Axes and Factorsp. 43
2.4.5 Multiple Correspondence Analysisp. 44
2.4.6 Summary of Correspondence Analysis Propertiesp. 46
2.5 Clusteringp. 46
2.5.1 Hierarchical Agglomerative Clusteringp. 46
2.5.2 Minimum Variance Agglomerative Criterionp. 49
2.5.3 Lance-Williams Dissimilarity Update Formulap. 49
2.5.4 Reciprocal Nearest Neighbors and Reducibilityp. 52
2.5.5 Nearest-Neighbor Chain Algorithmp. 53
2.5.6 Minimum Variance Method in Perspectivep. 54
2.5.7 Minimum Variance Method: Mathematical Propertiesp. 55
2.5.8 Simultaneous Analysis of Factors and Clustersp. 57
2.6 Questionsp. 57
2.7 Further R Software for Correspondence Analysisp. 58
2.7.1 Supplementary Elementsp. 58
2.7.2 FACOR: Interpretation of Factors and Clustersp. 61
2.7.3 VACOR: Interpretation of Variables and Clustersp. 64
2.7.4 Hierarchical Clustering in C, Callable from Rp. 67
2.8 Summaryp. 69
3 Input Data Codingp. 71
3.1 Introductionp. 71
3.1.1 The Fundamental Role of Codingp. 72
3.1.2 "Semantic Embedding"p. 73
3.1.3 Input Data Encodingsp. 75
3.1.4 Input Data Analyzed Without Transformationp. 76
3.2 From Doubling to Fuzzy Coding and Beyondp. 77
3.2.1 Doublingp. 77
3.2.2 Complete Disjunctive Formp. 79
3.2.3 Fuzzy, Piecewise Linear or Barycentric Codingp. 80
3.2.4 General Discussion of Data Codingp. 85
3.2.5 From Fuzzy Coding to Possibility Theoryp. 86
3.3 Assessment of Coding Methodsp. 92
3.4 The Personal Equation and Double Rescalingp. 98
3.5 Case Study: DNA Exon and Intron Junction Discriminationp. 99
3.6 Conclusions on Codingp. 103
3.7 Java Softwarep. 104
3.7.1 Running the Java Softwarep. 105
4 Examples and Case Studiesp. 111
4.1 Introduction to Analysis of Size and Shapep. 111
4.1.1 Morphometry of Prehistoric Thai Gobletsp. 111
4.1.2 Software Usedp. 116
4.2 Comparison of Prehistoric and Modern Groups of Canidsp. 118
4.2.1 Software Usedp. 130
4.3 Craniometric Data from Ancient Egyptian Tombsp. 135
4.3.1 Software Usedp. 139
4.4 Time-Varying Data Analysis: Examples from Economicsp. 140
4.4.1 Imports and Exports of Phosphatesp. 140
4.4.2 Services and Other Sectors in Economic Growthp. 145
4.5 Financial Modeling and Forecastingp. 148
4.5.1 Introductionp. 148
4.5.2 Brownian Motionp. 149
4.5.3 Granularity of Codingp. 150
4.5.4 Fingerprinting the Price Movementsp. 158
4.5.5 Conclusionsp. 160
5 Content Analysis of Textp. 161
5.1 Introductionp. 161
5.1.1 Accessing Contentp. 161
5.1.2 The Work of J.-P. Benzecrip. 161
5.1.3 Objectives and Some Findingsp. 163
5.1.4 Outline of the Chapterp. 164
5.2 Correspondence Analysisp. 164
5.2.1 Analyzing Datap. 164
5.2.2 Textual Data Preprocessingp. 165
5.3 Tool Words: Between Analysis of Form and Analysis of Contentp. 166
5.3.1 Tool Words versus Full Wordsp. 166
5.3.2 Tool Words in Various Languagesp. 167
5.3.3 Tool Words versus Metalanguages or Ontologiesp. 168
5.3.4 Refinement of Tool Wordsp. 170
5.3.5 Tool Words in Survey Analysisp. 171
5.3.6 The Text Aggregates Studiedp. 172
5.4 Towards Content Analysisp. 172
5.4.1 Intra-Document Analysis of Contentp. 172
5.4.2 Comparative Semantics: Diagnosis versus Prognosisp. 174
5.4.3 Semantics of Connotation and Denotationp. 175
5.4.4 Discipline-Based Theme Analysisp. 175
5.4.5 Mapping Cognitive Processesp. 176
5.4.6 History and Evolution of Ideasp. 176
5.4.7 Doctrinal Content and Stylistic Expressionp. 177
5.4.8 Interpreting Antinomies Through Cluster Branchingsp. 179
5.4.9 The Hypotheses of Plato on The Onep. 179
5.5 Textual and Documentary Typologyp. 180
5.5.1 Assessing Authorshipp. 180
5.5.2 Further Studies with Tool Words and Miscellaneous Approachesp. 184
5.6 Conclusion: Methodology in Free Text Analysisp. 186
5.7 Software for Text Processingp. 188
5.8 Introduction to the Text Analysis Case Studiesp. 189
5.9 Eight Hypotheses of Parmenides Regarding the Onep. 190
5.10 Comparative Study of Reality, Fable and Dreamp. 197
5.10.1 Aviation Accidentsp. 198
5.10.2 Dream Reportsp. 198
5.10.3 Grimm Fairy Talesp. 199
5.10.4 Three Jane Austen Novelsp. 199
5.10.5 Set of Textsp. 200
5.10.6 Tool Wordsp. 200
5.10.7 Domain Content Wordsp. 201
5.10.8 Analysis of Domains through Content-Oriented Wordsp. 205
5.11 Single Document Analysisp. 207
5.11.1 The Data: Aristotle's Categoriesp. 207
5.11.2 Structure of Presentationp. 210
5.11.3 Evolution of Presentationp. 214
5.12 Conclusion on Text Analysis Case Studiesp. 220
6 Concluding Remarksp. 221
Referencesp. 223
Indexp. 229