Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010161060 | QA76.9.D3 C655 2008 | Open Access Book | Book | Searching... |
Searching... | 30000003506502 | QA76.9.D3 C655 2008 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Due to increasing demands for dimensionality reduction, research on feature selection has deeply and widely expanded into many fields, including computational statistics, pattern recognition, machine learning, data mining, and knowledge discovery. Highlighting current research issues, Computational Methods of Feature Selection introduces the basic concepts and principles, state-of-the-art algorithms, and novel applications of this tool.
The book begins by exploring unsupervised, randomized, and causal feature selection. It then reports on some recent results of empowering feature selection, including active feature selection, decision-border estimate, the use of ensembles with independent probes, and incremental feature selection. This is followed by discussions of weighting and local methods, such as the ReliefF family, k -means clustering, local feature relevance, and a new interpretation of Relief. The book subsequently covers text classification, a new feature selection score, and both constraint-guided and aggressive feature selection. The final section examines applications of feature selection in bioinformatics, including feature construction as well as redundancy-, ensemble-, and penalty-based feature selection.
Through a clear, concise, and coherent presentation of topics, this volume systematically covers the key concepts, underlying principles, and inventive applications of feature selection, illustrating how this powerful tool can efficiently harness massive, high-dimensional data and turn it into valuable, reliable information.
Table of Contents
I Introduction and Background | p. 1 |
1 Less Is More | p. 3 |
1.1 Background and Basics | p. 4 |
1.2 Supervised, Unsupervised, and Semi-Supervised Feature Selection | p. 7 |
1.3 Key Contributions and Organization of the Book | p. 10 |
1.3.1 Part I - Introduction and Background | p. 10 |
1.3.2 Part II - Extending Feature Selection | p. 11 |
1.3.3 Part III - Weighting and Local Methods | p. 12 |
1.3.4 Part IV - Text Classification and Clustering | p. 13 |
1.3.5 Part V - Feature Selection in Bioinformatics | p. 14 |
1.4 Looking Ahead | p. 15 |
2 Unsupervised Feature Selection | p. 19 |
2.1 Introduction | p. 19 |
2.2 Clustering | p. 21 |
2.2.1 The K-Means Algorithm | p. 21 |
2.2.2 Finite Mixture Clustering | p. 22 |
2.3 Feature Selection | p. 23 |
2.3.1 Feature Search | p. 23 |
2.3.2 Feature Evaluation | p. 24 |
2.4 Feature Selection for Unlabeled Data | p. 25 |
2.4.1 Filter Methods | p. 26 |
2.4.2 Wrapper Methods | p. 27 |
2.5 Local Approaches | p. 32 |
2.5.1 Subspace Clustering | p. 32 |
2.5.2 Co-Clustering/Bi-Clustering | p. 33 |
2.6 Summary | p. 34 |
3 Randomized Feature Selection | p. 41 |
3.1 Introduction | p. 41 |
3.2 Types of Randomizations | p. 42 |
3.3 Randomized Complexity Classes | p. 43 |
3.4 Applying Randomization to Feature Selection | p. 45 |
3.5 The Role of Heuristics | p. 46 |
3.6 Examples of Randomized Selection Algorithms | p. 47 |
3.6.1 A Simple Las Vegas Approach | p. 47 |
3.6.2 Two Simple Monte Carlo Approaches | p. 49 |
3.6.3 Random Mutation Hill Climbing | p. 51 |
3.6.4 Simulated Annealing | p. 52 |
3.6.5 Genetic Algorithms | p. 54 |
3.6.6 Randomized Variable Elimination | p. 56 |
3.7 Issues in Randomization | p. 58 |
3.7.1 Pseudorandom Number Generators | p. 58 |
3.7.2 Sampling from Specialized Data Structures | p. 59 |
3.8 Summary | p. 59 |
4 Causal Feature Selection | p. 63 |
4.1 Introduction | p. 63 |
4.2 Classical "Non-Causal" Feature Selection | p. 65 |
4.3 The Concept of Causality | p. 68 |
4.3.1 Probabilistic Causality | p. 69 |
4.3.2 Causal Bayesian Networks | p. 70 |
4.4 Feature Relevance in Bayesian Networks | p. 71 |
4.4.1 Markov Blanket | p. 72 |
4.4.2 Characterizing Features Selected via Classical Methods | p. 73 |
4.5 Causal Discovery Algorithms | p. 77 |
4.5.1 A Prototypical Causal Discovery Algorithm | p. 78 |
4.5.2 Markov Blanket Induction Algorithms | p. 79 |
4.6 Examples of Applications | p. 80 |
4.7 Summary, Conclusions, and Open Problems | p. 82 |
II Extending Feature Selection | p. 87 |
5 Active Learning of Feature Relevance | p. 89 |
5.1 Introduction | p. 89 |
5.2 Active Sampling for Feature Relevance Estimation | p. 92 |
5.3 Derivation of the Sampling Benefit Function | p. 93 |
5.4 Implementation of the Active Sampling Algorithm | p. 95 |
5.4.1 Data Generation Model: Class-Conditional Mixture of Product Distributions | p. 95 |
5.4.2 Calculation of Feature Relevances | p. 96 |
5.4.3 Calculation of Conditional Probabilities | p. 97 |
5.4.4 Parameter Estimation | p. 97 |
5.5 Experiments | p. 99 |
5.5.1 Synthetic Data | p. 99 |
5.5.2 UCI Datasets | p. 100 |
5.5.3 Computational Complexity Issues | p. 102 |
5.6 Conclusions and Future Work | p. 102 |
6 A Study of Feature Extraction Techniques Based on Decision Border Estimate | p. 109 |
6.1 Introduction | p. 109 |
6.1.1 Background on Statistical Pattern Classification | p. 111 |
6.2 Feature Extraction Based on Decision Boundary | p. 112 |
6.2.1 MLP-Based Decision Boundary Feature Extraction | p. 113 |
6.2.2 SVM Decision Boundary Analysis | p. 114 |
6.3 Generalities About Labeled Vector Quantizers | p. 115 |
6.4 Feature Extraction Based on Vector Quantizers | p. 116 |
6.4.1 Weighting of Normal Vectors | p. 119 |
6.5 Experiments | p. 122 |
6.5.1 Experiment with Synthetic Data | p. 122 |
6.5.2 Experiment with Real Data | p. 124 |
6.6 Conclusions | p. 127 |
7 Ensemble-Based Variable Selection Using Independent Probes | p. 131 |
7.1 Introduction | p. 131 |
7.2 Tree Ensemble Methods in Feature Ranking | p. 132 |
7.3 The Algorithm: Ensemble-Based Ranking Against Independent Probes | p. 134 |
7.4 Experiments | p. 137 |
7.4.1 Benchmark Methods | p. 138 |
7.4.2 Data and Experiments | p. 139 |
7.5 Discussion | p. 143 |
8 Efficient Incremental-Ranked Feature Selection in Massive Data | p. 147 |
8.1 Introduction | p. 147 |
8.2 Related Work | p. 148 |
8.3 Preliminary Concepts | p. 150 |
8.3.1 Relevance | p. 150 |
8.3.2 Redundancy | p. 151 |
8.4 Incremental Performance over Ranking | p. 152 |
8.4.1 Incremental Ranked Usefulness | p. 153 |
8.4.2 Algorithm | p. 155 |
8.5 Experimental Results | p. 156 |
8.6 Conclusions | p. 164 |
III Weighting and Local Methods | p. 167 |
9 Non-Myopic Feature Quality Evaluation with (R)ReliefF | p. 169 |
9.1 Introduction | p. 169 |
9.2 From Impurity to Relief | p. 170 |
9.2.1 Impurity Measures in Classification | p. 171 |
9.2.2 Relief for Classification | p. 172 |
9.3 ReliefF for Classification and RReliefF for Regression | p. 175 |
9.4 Extensions | p. 178 |
9.4.1 ReliefF for Inductive Logic Programming | p. 178 |
9.4.2 Cost-Sensitive ReliefF | p. 180 |
9.4.3 Evaluation of Ordered Features at Value Level | p. 181 |
9.5 Interpretation | p. 182 |
9.5.1 Difference of Probabilities | p. 182 |
9.5.2 Portion of the Explained Concept | p. 183 |
9.6 Implementation Issues | p. 184 |
9.6.1 Time Complexity | p. 184 |
9.6.2 Active Sampling | p. 184 |
9.6.3 Parallelization | p. 185 |
9.7 Applications | p. 185 |
9.7.1 Feature Subset Selection | p. 185 |
9.7.2 Feature Ranking | p. 186 |
9.7.3 Feature Weighing | p. 186 |
9.7.4 Building Tree-Based Models | p. 187 |
9.7.5 Feature Discretization | p. 187 |
9.7.6 Association Rules and Genetic Algorithms | p. 187 |
9.7.7 Constructive Induction | p. 188 |
9.8 Conclusion | p. 188 |
10 Weighting Method for Feature Selection in K-Means | p. 193 |
10.1 Introduction | p. 193 |
10.2 Feature Weighting in k-Means | p. 194 |
10.3 W-k-Means Clustering Algorithm | p. 197 |
10.4 Feature Selection | p. 198 |
10.5 Subspace Clustering with k-Means | p. 200 |
10.6 Text Clustering | p. 201 |
10.6.1 Text Data and Subspace Clustering | p. 202 |
10.6.2 Selection of Key Words | p. 203 |
10.7 Related Work | p. 204 |
10.8 Discussions | p. 207 |
11 Local Feature Selection for Classification | p. 211 |
11.1 Introduction | p. 211 |
11.2 The Curse of Dimensionality | p. 213 |
11.3 Adaptive Metric Techniques | p. 214 |
11.3.1 Flexible Metric Nearest Neighbor Classification | p. 215 |
11.3.2 Discriminant Adaptive Nearest Neighbor Classification | p. 216 |
11.3.3 Adaptive Metric Nearest Neighbor Algorithm | p. 217 |
11.4 Large Margin Nearest Neighbor Classifiers | p. 222 |
11.4.1 Support Vector Machines | p. 223 |
11.4.2 Feature Weighting | p. 224 |
11.4.3 Large Margin Nearest Neighbor Classification | p. 225 |
11.4.4 Weighting Features Increases the Margin | p. 227 |
11.5 Experimental Comparisons | p. 228 |
11.6 Conclusions | p. 231 |
12 Feature Weighting through Local Learning | p. 233 |
12.1 Introduction | p. 233 |
12.2 Mathematical Interpretation of Relief | p. 235 |
12.3 Iterative Relief Algorithm | p. 236 |
12.3.1 Algorithm | p. 236 |
12.3.2 Convergence Analysis | p. 238 |
12.4 Extension to Multiclass Problems | p. 240 |
12.5 Online Learning | p. 240 |
12.6 Computational Complexity | p. 242 |
12.7 Experiments | p. 242 |
12.7.1 Experimental Setup | p. 242 |
12.7.2 Experiments on UCI Datasets | p. 244 |
12.7.3 Choice of Kernel Width | p. 248 |
12.7.4 Online Learning | p. 248 |
12.7.5 Experiments on Microarray Data | p. 249 |
12.8 Conclusion | p. 251 |
IV Text Classification and Clustering | p. 255 |
13 Feature Selection for Text Classification | p. 257 |
13.1 Introduction | p. 257 |
13.1.1 Feature Selection Phyla | p. 259 |
13.1.2 Characteristic Difficulties of Text Classification Tasks | p. 260 |
13.2 Text Feature Generators | p. 261 |
13.2.1 Word Merging | p. 261 |
13.2.2 Word Phrases | p. 262 |
13.2.3 Character N-grams | p. 263 |
13.2.4 Multi-Field Records | p. 264 |
13.2.5 Other Properties | p. 264 |
13.2.6 Feature Values | p. 265 |
13.3 Feature Filtering for Classification | p. 265 |
13.3.1 Binary Classification | p. 266 |
13.3.2 Multi-Class Classification | p. 269 |
13.3.3 Hierarchical Classification | p. 270 |
13.4 Practical and Scalable Computation | p. 271 |
13.5 A Case Study | p. 272 |
13.6 Conclusion and Future Work | p. 274 |
14 A Bayesian Feature Selection Score Based on Naive Bayes Models | p. 277 |
14.1 Introduction | p. 277 |
14.2 Feature Selection Scores | p. 279 |
14.2.1 Posterior Inclusion Probability (PIP) | p. 280 |
14.2.2 Posterior Inclusion Probability (PIP) under a Bernoulli distribution | p. 281 |
14.2.3 Posterior Inclusion Probability (PIPp) under Poisson distributions | p. 283 |
14.2.4 Information Gain (IG) | p. 284 |
14.2.5 Bi-Normal Separation (BNS) | p. 285 |
14.2.6 Chi-Square | p. 285 |
14.2.7 Odds Ratio | p. 286 |
14.2.8 Word Frequency | p. 286 |
14.3 Classification Algorithms | p. 286 |
14.4 Experimental Settings and Results | p. 287 |
14.4.1 Datasets | p. 287 |
14.4.2 Experimental Results | p. 288 |
14.5 Conclusion | p. 290 |
15 Pairwise Constraints-Guided Dimensionality Reduction | p. 295 |
15.1 Introduction | p. 295 |
15.2 Pairwise Constraints-Guided Feature Projection | p. 297 |
15.2.1 Feature Projection | p. 298 |
15.2.2 Projection-Based Semi-supervised Clustering | p. 300 |
15.3 Pairwise Constraints-Guided Co-clustering | p. 301 |
15.4 Experimental Studies | p. 302 |
15.4.1 Experimental Study - I | p. 302 |
15.4.2 Experimental Study - II | p. 306 |
15.4.3 Experimental Study - III | p. 309 |
15.5 Conclusion and Future Work | p. 310 |
16 Aggressive Feature Selection by Feature Ranking | p. 313 |
16.1 Introduction | p. 313 |
16.2 Feature Selection by Feature Ranking | p. 314 |
16.2.1 Multivariate Characteristic of Text Classifiers | p. 316 |
16.2.2 Term Redundancy | p. 316 |
16.3 Proposed Approach to Reducing Term Redundancy | p. 320 |
16.3.1 Stemming, Stopwords, and Low-DF Terms Elimination | p. 320 |
16.3.2 Feature Ranking | p. 320 |
16.3.3 Redundancy Reduction | p. 322 |
16.3.4 Redundancy Removal Algorithm | p. 325 |
16.3.5 Term Redundancy Tree | p. 326 |
16.4 Experimental Results | p. 326 |
16.5 Summary | p. 330 |
V Feature Selection in Bioinformatics | p. 335 |
17 Feature Selection for Genomic Data Analysis | p. 337 |
17.1 Introduction | p. 337 |
17.1.1 Microarray Data and Challenges | p. 337 |
17.1.2 Feature Selection for Microarray Data | p. 338 |
17.2 Redundancy-Based Feature Selection | p. 340 |
17.2.1 Feature Relevance and Redundancy | p. 340 |
17.2.2 An Efficient Framework for Redundancy Analysis | p. 343 |
17.2.3 RBF Algorithm | p. 345 |
17.3 Empirical Study | p. 347 |
17.3.1 Datasets | p. 347 |
17.3.2 Experimental Settings | p. 349 |
17.3.3 Results and Discussion | p. 349 |
17.4 Summary | p. 351 |
18 A Feature Generation Algorithm with Applications to Biological Sequence Classification | p. 355 |
18.1 Introduction | p. 355 |
18.2 Splice-Site Prediction | p. 356 |
18.2.1 The Splice-Site Prediction Problem | p. 356 |
18.2.2 Current Approaches | p. 357 |
18.2.3 Our Approach | p. 359 |
18.3 Feature Generation Algorithm | p. 359 |
18.3.1 Feature Type Analysis | p. 360 |
18.3.2 Feature Selection | p. 362 |
18.3.3 Feature Generation Algorithm (EGA) | p. 364 |
18.4 Experiments and Discussion | p. 366 |
18.4.1 Data Description | p. 366 |
18.4.2 Feature Generation | p. 367 |
18.4.3 Prediction Results for Individual Feature Types | p. 369 |
18.4.4 Splice-Site Prediction with FGA Features | p. 370 |
18.5 Conclusions | p. 372 |
19 An Ensemble Method for Identifying Robust Features for Biomarker Discovery | p. 377 |
19.1 Introduction | p. 377 |
19.2 Biomarker Discovery from Proteome Profiles | p. 378 |
19.3 Challenges of Biomarker Identification | p. 380 |
19.4 Ensemble Method for Feature Selection | p. 381 |
19.5 Feature Selection Ensemble | p. 383 |
19.6 Results and Discussion | p. 384 |
19.7 Conclusion | p. 389 |
20 Model Building and Feature Selection with Genomic Data | p. 393 |
20.1 Introduction | p. 393 |
20.2 Ridge Regression, Lasso, and Bridge | p. 394 |
20.3 Drawbacks of the Lasso | p. 396 |
20.4 The Elastic Net | p. 397 |
20.4.1 Definition | p. 397 |
20.4.2 A Stylized Example | p. 399 |
20.4.3 Computation and Tuning | p. 400 |
20.4.4 Analyzing the Cardiomypathy Data | p. 402 |
20.5 The Elastic-Net Penalized SVM | p. 404 |
20.5.1 Support Vector Machines | p. 404 |
20.5.2 A New SVM Classifier | p. 405 |
20.6 Sparse Eigen-Genes | p. 407 |
20.6.1 PCA and Eigen-Genes | p. 408 |
20.6.2 Sparse Principal Component Analysis | p. 408 |
20.7 Summary | p. 409 |
Index | p. 413 |