Cover image for Pattern recognition algorithms for data mining : scalability, knowledge discovery and soft granular computing
Title:
Pattern recognition algorithms for data mining : scalability, knowledge discovery and soft granular computing
Personal Author:
Publication Information:
Boca Raton, FL : Chapman & Hall/CRC, 2004
ISBN:
9781584884576
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000004736272 QA76.9.D343 P34 2004 Open Access Book Book
Searching...
Searching...
30000010078815 QA76.9.D343 P34 2004 Open Access Book Book
Searching...
Searching...
30000010077425 QA76.9.D343 P34 2004 Open Access Book Book
Searching...

On Order

Summary

Summary

Pattern Recognition Algorithms for Data Mining addresses different pattern recognition (PR) tasks in a unified framework with both theoretical and experimental results. Tasks covered include data condensation, feature selection, case generation, clustering/classification, and rule generation and evaluation. This volume presents various theories, methodologies, and algorithms, using both classical approaches and hybrid paradigms. The authors emphasize large datasets with overlapping, intractable, or nonlinear boundary classes, and datasets that demonstrate granular computing in soft frameworks.

Organized into eight chapters, the book begins with an introduction to PR, data mining, and knowledge discovery concepts. The authors analyze the tasks of multi-scale data condensation and dimensionality reduction, then explore the problem of learning with support vector machine (SVM). They conclude by highlighting the significance of granular computing for different mining tasks in a soft paradigm.


Author Notes

Pabitra Mitra is an Assistant Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology, Kanpur.


Table of Contents

Forewordp. xiii
Prefacep. xxi
List of Tablesp. xxv
List of Figuresp. xxvii
1 Introductionp. 1
1.1 Introductionp. 1
1.2 Pattern Recognition in Briefp. 3
1.2.1 Data acquisitionp. 4
1.2.2 Feature selection/extractionp. 4
1.2.3 Classificationp. 5
1.3 Knowledge Discovery in Databases (KDD)p. 7
1.4 Data Miningp. 10
1.4.1 Data mining tasksp. 10
1.4.2 Data mining toolsp. 12
1.4.3 Applications of data miningp. 12
1.5 Different Perspectives of Data Miningp. 14
1.5.1 Database perspectivep. 14
1.5.2 Statistical perspectivep. 15
1.5.3 Pattern recognition perspectivep. 15
1.5.4 Research issues and challengesp. 16
1.6 Scaling Pattern Recognition Algorithms to Large Data Setsp. 17
1.6.1 Data reductionp. 17
1.6.2 Dimensionality reductionp. 18
1.6.3 Active learningp. 19
1.6.4 Data partitioningp. 19
1.6.5 Granular computingp. 20
1.6.6 Efficient search algorithmsp. 20
1.7 Significance of Soft Computing in KDDp. 21
1.8 Scope of the Bookp. 22
2 Multiscale Data Condensationp. 29
2.1 Introductionp. 29
2.2 Data Condensation Algorithmsp. 32
2.2.1 Condensed nearest neighbor rulep. 32
2.2.2 Learning vector quantizationp. 33
2.2.3 Astrahan's density-based methodp. 34
2.3 Multiscale Representation of Datap. 34
2.4 Nearest Neighbor Density Estimatep. 37
2.5 Multiscale Data Condensation Algorithmp. 38
2.6 Experimental Results and Comparisonsp. 40
2.6.1 Density estimationp. 41
2.6.2 Test of statistical significancep. 41
2.6.3 Classification: Forest cover datap. 47
2.6.4 Clustering: Satellite image datap. 48
2.6.5 Rule generation: Census datap. 49
2.6.6 Study on scalabilityp. 52
2.6.7 Choice of scale parameterp. 52
2.7 Summaryp. 52
3 Unsupervised Feature Selectionp. 59
3.1 Introductionp. 59
3.2 Feature Extractionp. 60
3.3 Feature Selectionp. 62
3.3.1 Filter approachp. 63
3.3.2 Wrapper approachp. 64
3.4 Feature Selection Using Feature Similarity (FSFS)p. 64
3.4.1 Feature similarity measuresp. 65
3.4.2 Feature selection through clusteringp. 68
3.5 Feature Evaluation Indicesp. 71
3.5.1 Supervised indicesp. 71
3.5.2 Unsupervised indicesp. 72
3.5.3 Representation entropyp. 73
3.6 Experimental Results and Comparisonsp. 74
3.6.1 Comparison: Classification and clustering performancep. 74
3.6.2 Redundancy reduction: Quantitative studyp. 79
3.6.3 Effect of cluster sizep. 80
3.7 Summaryp. 82
4 Active Learning Using Support Vector Machinep. 83
4.1 Introductionp. 83
4.2 Support Vector Machinep. 86
4.3 Incremental Support Vector Learning with Multiple Pointsp. 88
4.4 Statistical Query Model of Learningp. 89
4.4.1 Query strategyp. 90
4.4.2 Confidence factor of support vector setp. 90
4.5 Learning Support Vectors with Statistical Queriesp. 91
4.6 Experimental Results and Comparisonp. 94
4.6.1 Classification accuracy and training timep. 94
4.6.2 Effectiveness of the confidence factorp. 97
4.6.3 Margin distributionp. 97
4.7 Summaryp. 101
5 Rough-fuzzy Case Generationp. 103
5.1 Introductionp. 103
5.2 Soft Granular Computingp. 105
5.3 Rough Setsp. 106
5.3.1 Information systemsp. 107
5.3.2 Indiscernibility and set approximationp. 107
5.3.3 Reductsp. 108
5.3.4 Dependency rule generationp. 110
5.4 Linguistic Representation of Patterns and Fuzzy Granulationp. 111
5.5 Rough-fuzzy Case Generation Methodologyp. 114
5.5.1 Thresholding and rule generationp. 115
5.5.2 Mapping dependency rules to casesp. 117
5.5.3 Case retrievalp. 118
5.6 Experimental Results and Comparisonp. 120
5.7 Summaryp. 121
6 Rough-fuzzy Clusteringp. 123
6.1 Introductionp. 123
6.2 Clustering Methodologiesp. 124
6.3 Algorithms for Clustering Large Data Setsp. 126
6.3.1 Clarans: Clustering large applications based upon randomized searchp. 126
6.3.2 Birch: Balanced iterative reducing and clustering using hierarchiesp. 126
6.3.3 Dbscan: Density-based spatial clustering of applications with noisep. 127
6.3.4 Sting: Statistical information gridp. 128
6.4 CemmiStri: Clustering using EM, Minimal Spanning Tree and Rough-fuzzy Initializationp. 129
6.4.1 Mixture model estimation via EM algorithmp. 130
6.4.2 Rough set initialization of mixture parametersp. 131
6.4.3 Mapping reducts to mixture parametersp. 132
6.4.4 Graph-theoretic clustering of Gaussian componentsp. 133
6.5 Experimental Results and Comparisonp. 135
6.6 Multispectral Image Segmentationp. 139
6.6.1 Discretization of image bandsp. 141
6.6.2 Integration of EM, MST and rough setsp. 141
6.6.3 Index for segmentation qualityp. 141
6.6.4 Experimental results and comparisonp. 141
6.7 Summaryp. 147
7 Rough Self-Organizing Mapp. 149
7.1 Introductionp. 149
7.2 Self-Organizing Maps (SOM)p. 150
7.2.1 Learningp. 151
7.2.2 Effect of neighborhoodp. 152
7.3 Incorporation of Rough Sets in SOM (RSOM)p. 152
7.3.1 Unsupervised rough set rule generationp. 153
7.3.2 Mapping rough set rules to network weightsp. 153
7.4 Rule Generation and Evaluationp. 154
7.4.1 Extraction methodologyp. 154
7.4.2 Evaluation indicesp. 155
7.5 Experimental Results and Comparisonp. 156
7.5.1 Clustering and quantization errorp. 157
7.5.2 Performance of rulesp. 162
7.6 Summaryp. 163
8 Classification, Rule Generation and Evaluation using Modular Rough-fuzzy MLPp. 165
8.1 Introductionp. 165
8.2 Ensemble Classifiersp. 167
8.3 Association Rulesp. 170
8.3.1 Rule generation algorithmsp. 170
8.3.2 Rule interestingnessp. 173
8.4 Classification Rulesp. 173
8.5 Rough-fuzzy MLPp. 175
8.5.1 Fuzzy MLPp. 175
8.5.2 Rough set knowledge encodingp. 176
8.6 Modular Evolution of Rough-fuzzy MLPp. 178
8.6.1 Algorithmp. 178
8.6.2 Evolutionary designp. 182
8.7 Rule Extraction and Quantitative Evaluationp. 184
8.7.1 Rule extraction methodologyp. 184
8.7.2 Quantitative measuresp. 188
8.8 Experimental Results and Comparisonp. 189
8.8.1 Classificationp. 190
8.8.2 Rule extractionp. 192
8.9 Summaryp. 199
A Role of Soft-Computing Tools in KDDp. 201
A.1 Fuzzy Setsp. 201
A.1.1 Clusteringp. 202
A.1.2 Association rulesp. 203
A.1.3 Functional dependenciesp. 204
A.1.4 Data summarizationp. 204
A.1.5 Web applicationp. 205
A.1.6 Image retrievalp. 205
A.2 Neural Networksp. 206
A.2.1 Rule extractionp. 206
A.2.2 Clustering and self organizationp. 206
A.2.3 Regressionp. 207
A.3 Neuro-fuzzy Computingp. 207
A.4 Genetic Algorithmsp. 208
A.5 Rough Setsp. 209
A.6 Other Hybridizationsp. 210
B Data Sets Used in Experimentsp. 211
Referencesp. 215
Indexp. 237
About the Authorsp. 243