Computational methods of feature selection

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010161060	QA76.9.D3 C655 2008	Open Access Book	Book	Searching... Unknown
Searching... SPACE KL Main Library	30000003506502	QA76.9.D3 C655 2008	Open Access Book	Book	Searching... Unknown

Due to increasing demands for dimensionality reduction, research on feature selection has deeply and widely expanded into many fields, including computational statistics, pattern recognition, machine learning, data mining, and knowledge discovery. Highlighting current research issues, Computational Methods of Feature Selection introduces the basic concepts and principles, state-of-the-art algorithms, and novel applications of this tool.

The book begins by exploring unsupervised, randomized, and causal feature selection. It then reports on some recent results of empowering feature selection, including active feature selection, decision-border estimate, the use of ensembles with independent probes, and incremental feature selection. This is followed by discussions of weighting and local methods, such as the ReliefF family, k -means clustering, local feature relevance, and a new interpretation of Relief. The book subsequently covers text classification, a new feature selection score, and both constraint-guided and aggressive feature selection. The final section examines applications of feature selection in bioinformatics, including feature construction as well as redundancy-, ensemble-, and penalty-based feature selection.

Through a clear, concise, and coherent presentation of topics, this volume systematically covers the key concepts, underlying principles, and inventive applications of feature selection, illustrating how this powerful tool can efficiently harness massive, high-dimensional data and turn it into valuable, reliable information.

Huan Liu and Hiroshi MotodaJennifer G. DyDavid J. StracuzziIsabelle Guyon and Constantin Aliferis and Andre ElisseeffEmanuele Olivetti and Sriharsha Veeramachaneni and Paolo AvesaniClaudia Diamantini and Domenico PotenaEugene Tuv and Alexander Borisov and Kari TorkkolaRoberto Ruiz and Jesus S. Aguilar-Ruiz and Jose C. RiquelmeIgor Kononenko and Marko Robnik SikonjaJoshua Zhexue Huang and Jun Xu and Michael Ng and Yunming YeCarlotta Domeniconi and Dimitrios GunopulosYijun SunGeorge FormanSusana Eyheramendy and David MadiganWei Tang and Shi ZhongMasoud Makrehchi and Mohamed S. KamelLei YuRezarta Islamaj Dogan and Lise Getoor and W. John WilburDiana Chan and Susan M. Bridges and Shane C. BurgessHui Zou and Trevor Hastie

I Introduction and Background	p. 1
1 Less Is More	p. 3
1.1 Background and Basics	p. 4
1.2 Supervised, Unsupervised, and Semi-Supervised Feature Selection	p. 7
1.3 Key Contributions and Organization of the Book	p. 10
1.3.1 Part I - Introduction and Background	p. 10
1.3.2 Part II - Extending Feature Selection	p. 11
1.3.3 Part III - Weighting and Local Methods	p. 12
1.3.4 Part IV - Text Classification and Clustering	p. 13
1.3.5 Part V - Feature Selection in Bioinformatics	p. 14
1.4 Looking Ahead	p. 15
2 Unsupervised Feature Selection	p. 19
2.1 Introduction	p. 19
2.2 Clustering	p. 21
2.2.1 The K-Means Algorithm	p. 21
2.2.2 Finite Mixture Clustering	p. 22
2.3 Feature Selection	p. 23
2.3.1 Feature Search	p. 23
2.3.2 Feature Evaluation	p. 24
2.4 Feature Selection for Unlabeled Data	p. 25
2.4.1 Filter Methods	p. 26
2.4.2 Wrapper Methods	p. 27
2.5 Local Approaches	p. 32
2.5.1 Subspace Clustering	p. 32
2.5.2 Co-Clustering/Bi-Clustering	p. 33
2.6 Summary	p. 34
3 Randomized Feature Selection	p. 41
3.1 Introduction	p. 41
3.2 Types of Randomizations	p. 42
3.3 Randomized Complexity Classes	p. 43
3.4 Applying Randomization to Feature Selection	p. 45
3.5 The Role of Heuristics	p. 46
3.6 Examples of Randomized Selection Algorithms	p. 47
3.6.1 A Simple Las Vegas Approach	p. 47
3.6.2 Two Simple Monte Carlo Approaches	p. 49
3.6.3 Random Mutation Hill Climbing	p. 51
3.6.4 Simulated Annealing	p. 52
3.6.5 Genetic Algorithms	p. 54
3.6.6 Randomized Variable Elimination	p. 56
3.7 Issues in Randomization	p. 58
3.7.1 Pseudorandom Number Generators	p. 58
3.7.2 Sampling from Specialized Data Structures	p. 59
3.8 Summary	p. 59
4 Causal Feature Selection	p. 63
4.1 Introduction	p. 63
4.2 Classical "Non-Causal" Feature Selection	p. 65
4.3 The Concept of Causality	p. 68
4.3.1 Probabilistic Causality	p. 69
4.3.2 Causal Bayesian Networks	p. 70
4.4 Feature Relevance in Bayesian Networks	p. 71
4.4.1 Markov Blanket	p. 72
4.4.2 Characterizing Features Selected via Classical Methods	p. 73
4.5 Causal Discovery Algorithms	p. 77
4.5.1 A Prototypical Causal Discovery Algorithm	p. 78
4.5.2 Markov Blanket Induction Algorithms	p. 79
4.6 Examples of Applications	p. 80
4.7 Summary, Conclusions, and Open Problems	p. 82
II Extending Feature Selection	p. 87
5 Active Learning of Feature Relevance	p. 89
5.1 Introduction	p. 89
5.2 Active Sampling for Feature Relevance Estimation	p. 92
5.3 Derivation of the Sampling Benefit Function	p. 93
5.4 Implementation of the Active Sampling Algorithm	p. 95
5.4.1 Data Generation Model: Class-Conditional Mixture of Product Distributions	p. 95
5.4.2 Calculation of Feature Relevances	p. 96
5.4.3 Calculation of Conditional Probabilities	p. 97
5.4.4 Parameter Estimation	p. 97
5.5 Experiments	p. 99
5.5.1 Synthetic Data	p. 99
5.5.2 UCI Datasets	p. 100
5.5.3 Computational Complexity Issues	p. 102
5.6 Conclusions and Future Work	p. 102
6 A Study of Feature Extraction Techniques Based on Decision Border Estimate	p. 109
6.1 Introduction	p. 109
6.1.1 Background on Statistical Pattern Classification	p. 111
6.2 Feature Extraction Based on Decision Boundary	p. 112
6.2.1 MLP-Based Decision Boundary Feature Extraction	p. 113
6.2.2 SVM Decision Boundary Analysis	p. 114
6.3 Generalities About Labeled Vector Quantizers	p. 115
6.4 Feature Extraction Based on Vector Quantizers	p. 116
6.4.1 Weighting of Normal Vectors	p. 119
6.5 Experiments	p. 122
6.5.1 Experiment with Synthetic Data	p. 122
6.5.2 Experiment with Real Data	p. 124
6.6 Conclusions	p. 127
7 Ensemble-Based Variable Selection Using Independent Probes	p. 131
7.1 Introduction	p. 131
7.2 Tree Ensemble Methods in Feature Ranking	p. 132
7.3 The Algorithm: Ensemble-Based Ranking Against Independent Probes	p. 134
7.4 Experiments	p. 137
7.4.1 Benchmark Methods	p. 138
7.4.2 Data and Experiments	p. 139
7.5 Discussion	p. 143
8 Efficient Incremental-Ranked Feature Selection in Massive Data	p. 147
8.1 Introduction	p. 147
8.2 Related Work	p. 148
8.3 Preliminary Concepts	p. 150
8.3.1 Relevance	p. 150
8.3.2 Redundancy	p. 151
8.4 Incremental Performance over Ranking	p. 152
8.4.1 Incremental Ranked Usefulness	p. 153
8.4.2 Algorithm	p. 155
8.5 Experimental Results	p. 156
8.6 Conclusions	p. 164
III Weighting and Local Methods	p. 167
9 Non-Myopic Feature Quality Evaluation with (R)ReliefF	p. 169
9.1 Introduction	p. 169
9.2 From Impurity to Relief	p. 170
9.2.1 Impurity Measures in Classification	p. 171
9.2.2 Relief for Classification	p. 172
9.3 ReliefF for Classification and RReliefF for Regression	p. 175
9.4 Extensions	p. 178
9.4.1 ReliefF for Inductive Logic Programming	p. 178
9.4.2 Cost-Sensitive ReliefF	p. 180
9.4.3 Evaluation of Ordered Features at Value Level	p. 181
9.5 Interpretation	p. 182
9.5.1 Difference of Probabilities	p. 182
9.5.2 Portion of the Explained Concept	p. 183
9.6 Implementation Issues	p. 184
9.6.1 Time Complexity	p. 184
9.6.2 Active Sampling	p. 184
9.6.3 Parallelization	p. 185
9.7 Applications	p. 185
9.7.1 Feature Subset Selection	p. 185
9.7.2 Feature Ranking	p. 186
9.7.3 Feature Weighing	p. 186
9.7.4 Building Tree-Based Models	p. 187
9.7.5 Feature Discretization	p. 187
9.7.6 Association Rules and Genetic Algorithms	p. 187
9.7.7 Constructive Induction	p. 188
9.8 Conclusion	p. 188
10 Weighting Method for Feature Selection in K-Means	p. 193
10.1 Introduction	p. 193
10.2 Feature Weighting in k-Means	p. 194
10.3 W-k-Means Clustering Algorithm	p. 197
10.4 Feature Selection	p. 198
10.5 Subspace Clustering with k-Means	p. 200
10.6 Text Clustering	p. 201
10.6.1 Text Data and Subspace Clustering	p. 202
10.6.2 Selection of Key Words	p. 203
10.7 Related Work	p. 204
10.8 Discussions	p. 207
11 Local Feature Selection for Classification	p. 211
11.1 Introduction	p. 211
11.2 The Curse of Dimensionality	p. 213
11.3 Adaptive Metric Techniques	p. 214
11.3.1 Flexible Metric Nearest Neighbor Classification	p. 215
11.3.2 Discriminant Adaptive Nearest Neighbor Classification	p. 216
11.3.3 Adaptive Metric Nearest Neighbor Algorithm	p. 217
11.4 Large Margin Nearest Neighbor Classifiers	p. 222
11.4.1 Support Vector Machines	p. 223
11.4.2 Feature Weighting	p. 224
11.4.3 Large Margin Nearest Neighbor Classification	p. 225
11.4.4 Weighting Features Increases the Margin	p. 227
11.5 Experimental Comparisons	p. 228
11.6 Conclusions	p. 231
12 Feature Weighting through Local Learning	p. 233
12.1 Introduction	p. 233
12.2 Mathematical Interpretation of Relief	p. 235
12.3 Iterative Relief Algorithm	p. 236
12.3.1 Algorithm	p. 236
12.3.2 Convergence Analysis	p. 238
12.4 Extension to Multiclass Problems	p. 240
12.5 Online Learning	p. 240
12.6 Computational Complexity	p. 242
12.7 Experiments	p. 242
12.7.1 Experimental Setup	p. 242
12.7.2 Experiments on UCI Datasets	p. 244
12.7.3 Choice of Kernel Width	p. 248
12.7.4 Online Learning	p. 248
12.7.5 Experiments on Microarray Data	p. 249
12.8 Conclusion	p. 251
IV Text Classification and Clustering	p. 255
13 Feature Selection for Text Classification	p. 257
13.1 Introduction	p. 257
13.1.1 Feature Selection Phyla	p. 259
13.1.2 Characteristic Difficulties of Text Classification Tasks	p. 260
13.2 Text Feature Generators	p. 261
13.2.1 Word Merging	p. 261
13.2.2 Word Phrases	p. 262
13.2.3 Character N-grams	p. 263
13.2.4 Multi-Field Records	p. 264
13.2.5 Other Properties	p. 264
13.2.6 Feature Values	p. 265
13.3 Feature Filtering for Classification	p. 265
13.3.1 Binary Classification	p. 266
13.3.2 Multi-Class Classification	p. 269
13.3.3 Hierarchical Classification	p. 270
13.4 Practical and Scalable Computation	p. 271
13.5 A Case Study	p. 272
13.6 Conclusion and Future Work	p. 274
14 A Bayesian Feature Selection Score Based on Naive Bayes Models	p. 277
14.1 Introduction	p. 277
14.2 Feature Selection Scores	p. 279
14.2.1 Posterior Inclusion Probability (PIP)	p. 280
14.2.2 Posterior Inclusion Probability (PIP) under a Bernoulli distribution	p. 281
14.2.3 Posterior Inclusion Probability (PIPp) under Poisson distributions	p. 283
14.2.4 Information Gain (IG)	p. 284
14.2.5 Bi-Normal Separation (BNS)	p. 285
14.2.6 Chi-Square	p. 285
14.2.7 Odds Ratio	p. 286
14.2.8 Word Frequency	p. 286
14.3 Classification Algorithms	p. 286
14.4 Experimental Settings and Results	p. 287
14.4.1 Datasets	p. 287
14.4.2 Experimental Results	p. 288
14.5 Conclusion	p. 290
15 Pairwise Constraints-Guided Dimensionality Reduction	p. 295
15.1 Introduction	p. 295
15.2 Pairwise Constraints-Guided Feature Projection	p. 297
15.2.1 Feature Projection	p. 298
15.2.2 Projection-Based Semi-supervised Clustering	p. 300
15.3 Pairwise Constraints-Guided Co-clustering	p. 301
15.4 Experimental Studies	p. 302
15.4.1 Experimental Study - I	p. 302
15.4.2 Experimental Study - II	p. 306
15.4.3 Experimental Study - III	p. 309
15.5 Conclusion and Future Work	p. 310
16 Aggressive Feature Selection by Feature Ranking	p. 313
16.1 Introduction	p. 313
16.2 Feature Selection by Feature Ranking	p. 314
16.2.1 Multivariate Characteristic of Text Classifiers	p. 316
16.2.2 Term Redundancy	p. 316
16.3 Proposed Approach to Reducing Term Redundancy	p. 320
16.3.1 Stemming, Stopwords, and Low-DF Terms Elimination	p. 320
16.3.2 Feature Ranking	p. 320
16.3.3 Redundancy Reduction	p. 322
16.3.4 Redundancy Removal Algorithm	p. 325
16.3.5 Term Redundancy Tree	p. 326
16.4 Experimental Results	p. 326
16.5 Summary	p. 330
V Feature Selection in Bioinformatics	p. 335
17 Feature Selection for Genomic Data Analysis	p. 337
17.1 Introduction	p. 337
17.1.1 Microarray Data and Challenges	p. 337
17.1.2 Feature Selection for Microarray Data	p. 338
17.2 Redundancy-Based Feature Selection	p. 340
17.2.1 Feature Relevance and Redundancy	p. 340
17.2.2 An Efficient Framework for Redundancy Analysis	p. 343
17.2.3 RBF Algorithm	p. 345
17.3 Empirical Study	p. 347
17.3.1 Datasets	p. 347
17.3.2 Experimental Settings	p. 349
17.3.3 Results and Discussion	p. 349
17.4 Summary	p. 351
18 A Feature Generation Algorithm with Applications to Biological Sequence Classification	p. 355
18.1 Introduction	p. 355
18.2 Splice-Site Prediction	p. 356
18.2.1 The Splice-Site Prediction Problem	p. 356
18.2.2 Current Approaches	p. 357
18.2.3 Our Approach	p. 359
18.3 Feature Generation Algorithm	p. 359
18.3.1 Feature Type Analysis	p. 360
18.3.2 Feature Selection	p. 362
18.3.3 Feature Generation Algorithm (EGA)	p. 364
18.4 Experiments and Discussion	p. 366
18.4.1 Data Description	p. 366
18.4.2 Feature Generation	p. 367
18.4.3 Prediction Results for Individual Feature Types	p. 369
18.4.4 Splice-Site Prediction with FGA Features	p. 370
18.5 Conclusions	p. 372
19 An Ensemble Method for Identifying Robust Features for Biomarker Discovery	p. 377
19.1 Introduction	p. 377
19.2 Biomarker Discovery from Proteome Profiles	p. 378
19.3 Challenges of Biomarker Identification	p. 380
19.4 Ensemble Method for Feature Selection	p. 381
19.5 Feature Selection Ensemble	p. 383
19.6 Results and Discussion	p. 384
19.7 Conclusion	p. 389
20 Model Building and Feature Selection with Genomic Data	p. 393
20.1 Introduction	p. 393
20.2 Ridge Regression, Lasso, and Bridge	p. 394
20.3 Drawbacks of the Lasso	p. 396
20.4 The Elastic Net	p. 397
20.4.1 Definition	p. 397
20.4.2 A Stylized Example	p. 399
20.4.3 Computation and Tuning	p. 400
20.4.4 Analyzing the Cardiomypathy Data	p. 402
20.5 The Elastic-Net Penalized SVM	p. 404
20.5.1 Support Vector Machines	p. 404
20.5.2 A New SVM Classifier	p. 405
20.6 Sparse Eigen-Genes	p. 407
20.6.1 PCA and Eigen-Genes	p. 408
20.6.2 Sparse Principal Component Analysis	p. 408
20.7 Summary	p. 409
Index	p. 413

Available:*

On Order

Summary

Summary

Table of Contents