Cover image for Computational methods of feature selection
Title:
Computational methods of feature selection
Personal Author:
Series:
Chapman & Hall/CRC data mining and knowledge discovery series
Publication Information:
Boca Raton, FL : Chapman & Hall/CRC, 2008
ISBN:
9781584888789

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010161060 QA76.9.D3 C655 2008 Open Access Book Book
Searching...
Searching...
30000003506502 QA76.9.D3 C655 2008 Open Access Book Book
Searching...

On Order

Summary

Summary

Due to increasing demands for dimensionality reduction, research on feature selection has deeply and widely expanded into many fields, including computational statistics, pattern recognition, machine learning, data mining, and knowledge discovery. Highlighting current research issues, Computational Methods of Feature Selection introduces the basic concepts and principles, state-of-the-art algorithms, and novel applications of this tool.

The book begins by exploring unsupervised, randomized, and causal feature selection. It then reports on some recent results of empowering feature selection, including active feature selection, decision-border estimate, the use of ensembles with independent probes, and incremental feature selection. This is followed by discussions of weighting and local methods, such as the ReliefF family, k -means clustering, local feature relevance, and a new interpretation of Relief. The book subsequently covers text classification, a new feature selection score, and both constraint-guided and aggressive feature selection. The final section examines applications of feature selection in bioinformatics, including feature construction as well as redundancy-, ensemble-, and penalty-based feature selection.

Through a clear, concise, and coherent presentation of topics, this volume systematically covers the key concepts, underlying principles, and inventive applications of feature selection, illustrating how this powerful tool can efficiently harness massive, high-dimensional data and turn it into valuable, reliable information.


Table of Contents

Huan Liu and Hiroshi MotodaJennifer G. DyDavid J. StracuzziIsabelle Guyon and Constantin Aliferis and Andre ElisseeffEmanuele Olivetti and Sriharsha Veeramachaneni and Paolo AvesaniClaudia Diamantini and Domenico PotenaEugene Tuv and Alexander Borisov and Kari TorkkolaRoberto Ruiz and Jesus S. Aguilar-Ruiz and Jose C. RiquelmeIgor Kononenko and Marko Robnik SikonjaJoshua Zhexue Huang and Jun Xu and Michael Ng and Yunming YeCarlotta Domeniconi and Dimitrios GunopulosYijun SunGeorge FormanSusana Eyheramendy and David MadiganWei Tang and Shi ZhongMasoud Makrehchi and Mohamed S. KamelLei YuRezarta Islamaj Dogan and Lise Getoor and W. John WilburDiana Chan and Susan M. Bridges and Shane C. BurgessHui Zou and Trevor Hastie
I Introduction and Backgroundp. 1
1 Less Is Morep. 3
1.1 Background and Basicsp. 4
1.2 Supervised, Unsupervised, and Semi-Supervised Feature Selectionp. 7
1.3 Key Contributions and Organization of the Bookp. 10
1.3.1 Part I - Introduction and Backgroundp. 10
1.3.2 Part II - Extending Feature Selectionp. 11
1.3.3 Part III - Weighting and Local Methodsp. 12
1.3.4 Part IV - Text Classification and Clusteringp. 13
1.3.5 Part V - Feature Selection in Bioinformaticsp. 14
1.4 Looking Aheadp. 15
2 Unsupervised Feature Selectionp. 19
2.1 Introductionp. 19
2.2 Clusteringp. 21
2.2.1 The K-Means Algorithmp. 21
2.2.2 Finite Mixture Clusteringp. 22
2.3 Feature Selectionp. 23
2.3.1 Feature Searchp. 23
2.3.2 Feature Evaluationp. 24
2.4 Feature Selection for Unlabeled Datap. 25
2.4.1 Filter Methodsp. 26
2.4.2 Wrapper Methodsp. 27
2.5 Local Approachesp. 32
2.5.1 Subspace Clusteringp. 32
2.5.2 Co-Clustering/Bi-Clusteringp. 33
2.6 Summaryp. 34
3 Randomized Feature Selectionp. 41
3.1 Introductionp. 41
3.2 Types of Randomizationsp. 42
3.3 Randomized Complexity Classesp. 43
3.4 Applying Randomization to Feature Selectionp. 45
3.5 The Role of Heuristicsp. 46
3.6 Examples of Randomized Selection Algorithmsp. 47
3.6.1 A Simple Las Vegas Approachp. 47
3.6.2 Two Simple Monte Carlo Approachesp. 49
3.6.3 Random Mutation Hill Climbingp. 51
3.6.4 Simulated Annealingp. 52
3.6.5 Genetic Algorithmsp. 54
3.6.6 Randomized Variable Eliminationp. 56
3.7 Issues in Randomizationp. 58
3.7.1 Pseudorandom Number Generatorsp. 58
3.7.2 Sampling from Specialized Data Structuresp. 59
3.8 Summaryp. 59
4 Causal Feature Selectionp. 63
4.1 Introductionp. 63
4.2 Classical "Non-Causal" Feature Selectionp. 65
4.3 The Concept of Causalityp. 68
4.3.1 Probabilistic Causalityp. 69
4.3.2 Causal Bayesian Networksp. 70
4.4 Feature Relevance in Bayesian Networksp. 71
4.4.1 Markov Blanketp. 72
4.4.2 Characterizing Features Selected via Classical Methodsp. 73
4.5 Causal Discovery Algorithmsp. 77
4.5.1 A Prototypical Causal Discovery Algorithmp. 78
4.5.2 Markov Blanket Induction Algorithmsp. 79
4.6 Examples of Applicationsp. 80
4.7 Summary, Conclusions, and Open Problemsp. 82
II Extending Feature Selectionp. 87
5 Active Learning of Feature Relevancep. 89
5.1 Introductionp. 89
5.2 Active Sampling for Feature Relevance Estimationp. 92
5.3 Derivation of the Sampling Benefit Functionp. 93
5.4 Implementation of the Active Sampling Algorithmp. 95
5.4.1 Data Generation Model: Class-Conditional Mixture of Product Distributionsp. 95
5.4.2 Calculation of Feature Relevancesp. 96
5.4.3 Calculation of Conditional Probabilitiesp. 97
5.4.4 Parameter Estimationp. 97
5.5 Experimentsp. 99
5.5.1 Synthetic Datap. 99
5.5.2 UCI Datasetsp. 100
5.5.3 Computational Complexity Issuesp. 102
5.6 Conclusions and Future Workp. 102
6 A Study of Feature Extraction Techniques Based on Decision Border Estimatep. 109
6.1 Introductionp. 109
6.1.1 Background on Statistical Pattern Classificationp. 111
6.2 Feature Extraction Based on Decision Boundaryp. 112
6.2.1 MLP-Based Decision Boundary Feature Extractionp. 113
6.2.2 SVM Decision Boundary Analysisp. 114
6.3 Generalities About Labeled Vector Quantizersp. 115
6.4 Feature Extraction Based on Vector Quantizersp. 116
6.4.1 Weighting of Normal Vectorsp. 119
6.5 Experimentsp. 122
6.5.1 Experiment with Synthetic Datap. 122
6.5.2 Experiment with Real Datap. 124
6.6 Conclusionsp. 127
7 Ensemble-Based Variable Selection Using Independent Probesp. 131
7.1 Introductionp. 131
7.2 Tree Ensemble Methods in Feature Rankingp. 132
7.3 The Algorithm: Ensemble-Based Ranking Against Independent Probesp. 134
7.4 Experimentsp. 137
7.4.1 Benchmark Methodsp. 138
7.4.2 Data and Experimentsp. 139
7.5 Discussionp. 143
8 Efficient Incremental-Ranked Feature Selection in Massive Datap. 147
8.1 Introductionp. 147
8.2 Related Workp. 148
8.3 Preliminary Conceptsp. 150
8.3.1 Relevancep. 150
8.3.2 Redundancyp. 151
8.4 Incremental Performance over Rankingp. 152
8.4.1 Incremental Ranked Usefulnessp. 153
8.4.2 Algorithmp. 155
8.5 Experimental Resultsp. 156
8.6 Conclusionsp. 164
III Weighting and Local Methodsp. 167
9 Non-Myopic Feature Quality Evaluation with (R)ReliefFp. 169
9.1 Introductionp. 169
9.2 From Impurity to Reliefp. 170
9.2.1 Impurity Measures in Classificationp. 171
9.2.2 Relief for Classificationp. 172
9.3 ReliefF for Classification and RReliefF for Regressionp. 175
9.4 Extensionsp. 178
9.4.1 ReliefF for Inductive Logic Programmingp. 178
9.4.2 Cost-Sensitive ReliefFp. 180
9.4.3 Evaluation of Ordered Features at Value Levelp. 181
9.5 Interpretationp. 182
9.5.1 Difference of Probabilitiesp. 182
9.5.2 Portion of the Explained Conceptp. 183
9.6 Implementation Issuesp. 184
9.6.1 Time Complexityp. 184
9.6.2 Active Samplingp. 184
9.6.3 Parallelizationp. 185
9.7 Applicationsp. 185
9.7.1 Feature Subset Selectionp. 185
9.7.2 Feature Rankingp. 186
9.7.3 Feature Weighingp. 186
9.7.4 Building Tree-Based Modelsp. 187
9.7.5 Feature Discretizationp. 187
9.7.6 Association Rules and Genetic Algorithmsp. 187
9.7.7 Constructive Inductionp. 188
9.8 Conclusionp. 188
10 Weighting Method for Feature Selection in K-Meansp. 193
10.1 Introductionp. 193
10.2 Feature Weighting in k-Meansp. 194
10.3 W-k-Means Clustering Algorithmp. 197
10.4 Feature Selectionp. 198
10.5 Subspace Clustering with k-Meansp. 200
10.6 Text Clusteringp. 201
10.6.1 Text Data and Subspace Clusteringp. 202
10.6.2 Selection of Key Wordsp. 203
10.7 Related Workp. 204
10.8 Discussionsp. 207
11 Local Feature Selection for Classificationp. 211
11.1 Introductionp. 211
11.2 The Curse of Dimensionalityp. 213
11.3 Adaptive Metric Techniquesp. 214
11.3.1 Flexible Metric Nearest Neighbor Classificationp. 215
11.3.2 Discriminant Adaptive Nearest Neighbor Classificationp. 216
11.3.3 Adaptive Metric Nearest Neighbor Algorithmp. 217
11.4 Large Margin Nearest Neighbor Classifiersp. 222
11.4.1 Support Vector Machinesp. 223
11.4.2 Feature Weightingp. 224
11.4.3 Large Margin Nearest Neighbor Classificationp. 225
11.4.4 Weighting Features Increases the Marginp. 227
11.5 Experimental Comparisonsp. 228
11.6 Conclusionsp. 231
12 Feature Weighting through Local Learningp. 233
12.1 Introductionp. 233
12.2 Mathematical Interpretation of Reliefp. 235
12.3 Iterative Relief Algorithmp. 236
12.3.1 Algorithmp. 236
12.3.2 Convergence Analysisp. 238
12.4 Extension to Multiclass Problemsp. 240
12.5 Online Learningp. 240
12.6 Computational Complexityp. 242
12.7 Experimentsp. 242
12.7.1 Experimental Setupp. 242
12.7.2 Experiments on UCI Datasetsp. 244
12.7.3 Choice of Kernel Widthp. 248
12.7.4 Online Learningp. 248
12.7.5 Experiments on Microarray Datap. 249
12.8 Conclusionp. 251
IV Text Classification and Clusteringp. 255
13 Feature Selection for Text Classificationp. 257
13.1 Introductionp. 257
13.1.1 Feature Selection Phylap. 259
13.1.2 Characteristic Difficulties of Text Classification Tasksp. 260
13.2 Text Feature Generatorsp. 261
13.2.1 Word Mergingp. 261
13.2.2 Word Phrasesp. 262
13.2.3 Character N-gramsp. 263
13.2.4 Multi-Field Recordsp. 264
13.2.5 Other Propertiesp. 264
13.2.6 Feature Valuesp. 265
13.3 Feature Filtering for Classificationp. 265
13.3.1 Binary Classificationp. 266
13.3.2 Multi-Class Classificationp. 269
13.3.3 Hierarchical Classificationp. 270
13.4 Practical and Scalable Computationp. 271
13.5 A Case Studyp. 272
13.6 Conclusion and Future Workp. 274
14 A Bayesian Feature Selection Score Based on Naive Bayes Modelsp. 277
14.1 Introductionp. 277
14.2 Feature Selection Scoresp. 279
14.2.1 Posterior Inclusion Probability (PIP)p. 280
14.2.2 Posterior Inclusion Probability (PIP) under a Bernoulli distributionp. 281
14.2.3 Posterior Inclusion Probability (PIPp) under Poisson distributionsp. 283
14.2.4 Information Gain (IG)p. 284
14.2.5 Bi-Normal Separation (BNS)p. 285
14.2.6 Chi-Squarep. 285
14.2.7 Odds Ratiop. 286
14.2.8 Word Frequencyp. 286
14.3 Classification Algorithmsp. 286
14.4 Experimental Settings and Resultsp. 287
14.4.1 Datasetsp. 287
14.4.2 Experimental Resultsp. 288
14.5 Conclusionp. 290
15 Pairwise Constraints-Guided Dimensionality Reductionp. 295
15.1 Introductionp. 295
15.2 Pairwise Constraints-Guided Feature Projectionp. 297
15.2.1 Feature Projectionp. 298
15.2.2 Projection-Based Semi-supervised Clusteringp. 300
15.3 Pairwise Constraints-Guided Co-clusteringp. 301
15.4 Experimental Studiesp. 302
15.4.1 Experimental Study - Ip. 302
15.4.2 Experimental Study - IIp. 306
15.4.3 Experimental Study - IIIp. 309
15.5 Conclusion and Future Workp. 310
16 Aggressive Feature Selection by Feature Rankingp. 313
16.1 Introductionp. 313
16.2 Feature Selection by Feature Rankingp. 314
16.2.1 Multivariate Characteristic of Text Classifiersp. 316
16.2.2 Term Redundancyp. 316
16.3 Proposed Approach to Reducing Term Redundancyp. 320
16.3.1 Stemming, Stopwords, and Low-DF Terms Eliminationp. 320
16.3.2 Feature Rankingp. 320
16.3.3 Redundancy Reductionp. 322
16.3.4 Redundancy Removal Algorithmp. 325
16.3.5 Term Redundancy Treep. 326
16.4 Experimental Resultsp. 326
16.5 Summaryp. 330
V Feature Selection in Bioinformaticsp. 335
17 Feature Selection for Genomic Data Analysisp. 337
17.1 Introductionp. 337
17.1.1 Microarray Data and Challengesp. 337
17.1.2 Feature Selection for Microarray Datap. 338
17.2 Redundancy-Based Feature Selectionp. 340
17.2.1 Feature Relevance and Redundancyp. 340
17.2.2 An Efficient Framework for Redundancy Analysisp. 343
17.2.3 RBF Algorithmp. 345
17.3 Empirical Studyp. 347
17.3.1 Datasetsp. 347
17.3.2 Experimental Settingsp. 349
17.3.3 Results and Discussionp. 349
17.4 Summaryp. 351
18 A Feature Generation Algorithm with Applications to Biological Sequence Classificationp. 355
18.1 Introductionp. 355
18.2 Splice-Site Predictionp. 356
18.2.1 The Splice-Site Prediction Problemp. 356
18.2.2 Current Approachesp. 357
18.2.3 Our Approachp. 359
18.3 Feature Generation Algorithmp. 359
18.3.1 Feature Type Analysisp. 360
18.3.2 Feature Selectionp. 362
18.3.3 Feature Generation Algorithm (EGA)p. 364
18.4 Experiments and Discussionp. 366
18.4.1 Data Descriptionp. 366
18.4.2 Feature Generationp. 367
18.4.3 Prediction Results for Individual Feature Typesp. 369
18.4.4 Splice-Site Prediction with FGA Featuresp. 370
18.5 Conclusionsp. 372
19 An Ensemble Method for Identifying Robust Features for Biomarker Discoveryp. 377
19.1 Introductionp. 377
19.2 Biomarker Discovery from Proteome Profilesp. 378
19.3 Challenges of Biomarker Identificationp. 380
19.4 Ensemble Method for Feature Selectionp. 381
19.5 Feature Selection Ensemblep. 383
19.6 Results and Discussionp. 384
19.7 Conclusionp. 389
20 Model Building and Feature Selection with Genomic Datap. 393
20.1 Introductionp. 393
20.2 Ridge Regression, Lasso, and Bridgep. 394
20.3 Drawbacks of the Lassop. 396
20.4 The Elastic Netp. 397
20.4.1 Definitionp. 397
20.4.2 A Stylized Examplep. 399
20.4.3 Computation and Tuningp. 400
20.4.4 Analyzing the Cardiomypathy Datap. 402
20.5 The Elastic-Net Penalized SVMp. 404
20.5.1 Support Vector Machinesp. 404
20.5.2 A New SVM Classifierp. 405
20.6 Sparse Eigen-Genesp. 407
20.6.1 PCA and Eigen-Genesp. 408
20.6.2 Sparse Principal Component Analysisp. 408
20.7 Summaryp. 409
Indexp. 413