Cover image for Pattern classification using ensemble methods
Title:
Pattern classification using ensemble methods
Personal Author:
Series:
Series in machine perception and artificial intelligence ; 75
Publication Information:
Singapore ; Hackensack, NJ : World Scientific Publishing, 2010
Physical Description:
xv, 225 p. : ill. ; 24 cm.
ISBN:
9789814271066

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010270383 TK7882.P3 R65 2010 Open Access Book Book
Searching...

On Order

Table of Contents

Prefacep. vii
1 Introduction to Pattern Classificationp. 1
1.1 Pattern Classificationp. 2
1.2 Induction Algorithmsp. 4
1.3 Rule Inductionp. 5
1.4 Decision Treesp. 5
1.5 Bayesian Methodsp. 8
1.5.1 Overviewp. 8
1.5.2 Naïve Bayesp. 9
1.5.2.1 The Basic Naïve Bayes Classifierp. 9
1.5.2.2 Naïve Bayes Induction for Numeric Attributesp. 12
1.5.2.3 Correction to the Probability Estimationp. 12
1.5.2.4 Laplace Correctionp. 13
1.5.2.5 No Matchp. 14
1.5.3 Other Bayesian Methodsp. 14
1.6 Other Induction Methodsp. 14
1.6.1 Neural Networksp. 14
1.6.2 Genetic Algorithmsp. 17
1.6.3 Instance-based Learningp. 17
1.6.4 Support Vector Machinesp. 18
2 Introduction to Ensemble Learningp. 19
2.1 Back to the Rootsp. 20
2.2 The Wisdom of Crowdsp. 22
2.3 The Bagging Algorithmp. 22
2.4 The Boosting Algorithmp. 28
2.5 The AdaBoost Algorithmp. 28
2.6 No Free Lunch Theorem and Ensemble Learningp. 36
2.7 Bias-Variance Decomposition and Ensemble Learningp. 38
2.8 Occam's Razor and Ensemble Learningp. 40
2.9 Classifier Dependencyp. 41
2.9.1 Dependent Methodsp. 42
2.9.1.1 Model-guided Instance Selectionp. 42
2.9.1.2 Basic Boosting Algorithmsp. 42
2.9.1.3 Advanced Boosting Algorithmsp. 44
2.9.1.4 Incremental Batch Learningp. 51
2.9.2 Independent Methodsp. 51
2.9.2.1 Baggingp. 53
2.9.2.2 Waggingp. 54
2.9.2.3 Random Forest and Random Subspace Projectionp. 55
2.9.2.4 Non-Linear Boosting Projection (NLBP)p. 56
2.9.2.5 Cross-validated Committeesp. 58
2.9.2.6 Robust Boostingp. 59
2.10 Ensemble Methods for Advanced Classification Tasksp. 61
2.10.1 Cost-Sensitive Classificationp. 61
2.10.2 Ensemble for Learning Concept Driftp. 63
2.10.3 Reject Driven Classificationp. 63
3 Ensemble Classificationp. 65
3.1 Fusions Methodsp. 65
3.1.1 Weighting Methodsp. 65
3.1.2 Majority Votingp. 66
3.1.3 Performance Weightingp. 67
3.1.4 Distribution Summationp. 68
3.1.5 Bayesian Combinationp. 68
3.1.6 Dempster-Shaferp. 69
3.1.7 Voggingp. 69
3.1.8 Naïve Bayesp. 69
3.1.9 Entropy Weightingp. 70
3.1.10 Density-based Weightingp. 70
3.1.11 DEA Weighting Methodp. 70
3.1.12 Logarithmic Opinion Poolp. 71
3.1.13 Order Statisticsp. 71
3.2 Selecting Classificationp. 71
3.2.1 Partitioning the Instance Spacep. 74
3.2.1.1 The K-Means Algorithm as a Decomposition Toolp. 75
3.2.1.2 Determining the Number of Subsetsp. 78
3.2.1.3 The Basic K-Classifier Algorithmp. 78
3.2.1.4 The Heterogeneity Detecting K-Classifier (HDK-Classifier)p. 81
3.2.1.5 Running-Time Complexityp. 81
3.3 Mixture of Experts and Meta Learningp. 82
3.3.1 Stackingp. 82
3.3.2 Arbiter Treesp. 85
3.3.3 Combiner Treesp. 88
3.3.4 Gradingp. 88
3.3.5 Gating Networkp. 89
4 Ensemble Diversityp. 93
4.1 Overviewp. 93
4.2 Manipulating the Inducerp. 94
4.2.1 Manipulation of the Inducer's Parametersp. 95
4.2.2 Starting Point in Hypothesis Spacep. 95
4.2.3 Hypothesis Space Traversalp. 95
4.3 Manipulating the Training Samplesp. 96
4.3.1 Resamplingp. 96
4.3.2 Creationp. 97
4.3.3 Partitioningp. 100
4.4 Manipulating the Target Attribute Representationp. 101
4.4.1 Label Switchingp. 102
4.5 Partitioning the Search Spacep. 103
4.5.1 Divide and Conquerp. 104
4.5.2 Feature Subset-based Ensemble Methodsp. 105
4.5.2.1 Random-based Strategyp. 106
4.5.2.2 Reduct-based Strategyp. 106
4.5.2.3 Collective-Performance-based Strategyp. 107
4.5.2.4 Feature Set Partitioningp. 108
4.5.2.5 Rotation Forestp. 111
4.6 Multi-Inducersp. 112
4.7 Measuring the Diversityp. 114
5 Ensemble Selectionp. 119
5.1 Ensemble Selectionp. 119
5.2 Pre Selection of the Ensemble Sizep. 120
5.3 Selection of the Ensemble Size While Trainingp. 120
5.4 Pruning - Post Selection of the Ensemble Sizep. 121
5.4.1 Ranking-basedp. 122
5.4.2 Search based Methodsp. 123
5.4.2.1 Collective Agreement-based Ensemble Pruning Methodp. 124
5.4.3 Clustering-based Methodsp. 129
5.4.4 Pruning Timingp. 129
5.4.4.1 Pre-combining Pruningp. 129
5.4.4.2 Post-combining Pruningp. 130
6 Error Correcting Output Codesp. 133
6.1 Code-matrix Decomposition of Multiclass Problemsp. 135
6.2 Type I - Training an Ensemble Given a Code-Matrixp. 136
6.2.1 Error correcting output codesp. 138
6.2.2 Code-Matrix Frameworkp. 139
6.2.3 Code-matrix Design Problemp. 140
6.2.4 Orthogonal Arrays (OA)p. 144
6.2.5 Hadamard Matrixp. 146
6.2.6 Probabilistic Error Correcting Output Codep. 146
6.2.7 Other ECOC Strategiesp. 147
6.3 Type II - Adapting Code-matrices to the Multiclass Problemsp. 149
7 Evaluating Ensembles of Classifiersp. 153
7.1 Generalization Errorp. 153
7.1.1 Theoretical Estimation of Generalization Errorp. 154
7.1.2 Empirical Estimation of Generalization Errorp. 155
7.1.3 Alternatives to the Accuracy Measurep. 157
7.1.4 The F-Measurep. 158
7.1.5 Confusion Matrixp. 160
7.1.6 Classifier Evaluation under Limited Resourcesp. 161
7.1.6.1 ROC Curvesp. 163
7.1.6.2 Hit Rate Curvep. 163
7.1.6.3 Qrecall (Quota Recall)p. 164
7.1.6.4 Lift Curvep. 164
7.1.6.5 Pearson Correlation Coefficientp. 165
7.1.6.6 Area Under Curve (AUC)p. 166
7.1.6.7 Average Hit Ratep. 167
7.1.6.8 Average Qrecallp. 168
7.1.6.9 Potential Extract Measure (PEM)p. 170
7.1.7 Statistical Tests for Comparing Ensemblesp. 172
7.1.7.1 McNemar's Testp. 173
7.1.7.2 A Test for the Difference of Two Proportionsp. 174
7.1.7.3 The Resampled Paired t Testp. 175
7.1.7.4 The k-fold Cross-validated Paired t Testp. 176
7.2 Computational Complexityp. 176
7.3 Interpretability of the Resulting Ensemblep. 177
7.4 Scalability to Large Datasetsp. 178
7.5 Robustnessp. 179
7.6 Stabilityp. 180
7.7 Flexibilityp. 180
7.8 Usabilityp. 180
7.9 Software Availabilityp. 180
7.10 Which Ensemble Method Should be Used?p. 181
Bibliographyp. 185
Indexp. 223