Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000004583815 | QA76.9.T48 J62 2002 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Based on ideas from Support Vector Machines (SVMs), Learning To Classify Text Using Support Vector Machines presents a new approach to generating text classifiers from examples. The approach combines high performance and efficiency with theoretical understanding and improved robustness. In particular, it is highly effective without greedy heuristic components. The SVM approach is computationally efficient in training and classification, and it comes with a learning theory that can guide real-world applications.
Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning.
Table of Contents
Foreword | p. xi |
Preface | p. xiii |
Acknowledgments | p. xv |
Notation | p. xvii |
1. Introduction | p. 1 |
1 Challenges | p. 2 |
2 Goals | p. 3 |
3 Overview and Structure of the Argument | p. 4 |
3.1 Theory | p. 4 |
3.2 Methods | p. 5 |
3.3 Algorithms | p. 6 |
4 Summary | p. 6 |
2. Text Classification | p. 7 |
1 Learning Task | p. 7 |
1.1 Binary Setting | p. 8 |
1.2 Multi-Class Setting | p. 9 |
1.3 Multi-Label Setting | p. 10 |
2 Representing Text | p. 12 |
2.1 Word Level | p. 13 |
2.2 Sub-Word Level | p. 15 |
2.3 Multi-Word Level | p. 15 |
2.4 Semantic Level | p. 16 |
3 Feature Selection | p. 16 |
3.1 Feature Subset Selection | p. 17 |
3.2 Feature Construction | p. 19 |
4 Term Weighting | p. 20 |
5 Conventional Learning Methods | p. 22 |
5.1 Naive Bayes Classifier | p. 22 |
5.2 Rocchio Algorithm | p. 24 |
5.3 [kappa]-Nearest Neighbors | p. 25 |
5.4 Decision Tree Classifier | p. 25 |
5.5 Other Methods | p. 26 |
6 Performance Measures | p. 27 |
6.1 Error Rate and Asymmetric Cost | p. 28 |
6.2 Precision and Recall | p. 29 |
6.3 Precision/Recall Breakeven Point and F[subscript [beta]-Measure | p. 30 |
6.4 Micro- and Macro-Averaging | p. 30 |
7 Experimental Setup | p. 31 |
7.1 Test Collections | p. 31 |
7.2 Design Choices | p. 32 |
3. Support Vector Machines | p. 35 |
1 Linear Hard-Margin SVMs | p. 36 |
2 Soft-Margin SVMs | p. 39 |
3 Non-Linear SVMs | p. 41 |
4 Asymmetric Misclassification Cost | p. 43 |
5 Other Maximum-Margin Methods | p. 43 |
6 Further Work and Further Information | p. 44 |
Part Theory | |
4. A Statistical Learning Model of Text Classification for SVMs | p. 45 |
1 Properties of Text-Classification Tasks | p. 46 |
1.1 High-Dimensional Feature Space | p. 46 |
1.2 Sparse Document Vectors | p. 47 |
1.3 Heterogeneous Use of Terms | p. 47 |
1.4 High Level of Redundancy | p. 48 |
1.5 Frequency Distribution of Words and Zipf's Law | p. 49 |
2 A Discriminative Model of Text Classification | p. 51 |
2.1 Step 1: Bounding the Expected Error Based on the Margin | p. 51 |
2.2 Step 2: Homogeneous TCat-Concepts as a Model of Text-Classification Tasks | p. 53 |
2.3 Step 3: Learnability of TCat-Concepts | p. 59 |
3 Comparing the Theoretical Model with Experimental Results | p. 64 |
4 Sensitivity Analysis: Difficult and Easy Learning Tasks | p. 66 |
4.1 Influence of Occurrence Frequency | p. 66 |
4.2 Discriminative Power of Term Sets | p. 68 |
4.3 Level of Redundancy | p. 68 |
5 Noisy TCat-Concepts | p. 69 |
6 Limitations of the Model and Open Questions | p. 72 |
7 Related Work | p. 72 |
8 Summary and Conclusions | p. 74 |
5. Efficient Performance Estimators for SVMs | p. 75 |
1 Generic Performance Estimators | p. 76 |
1.1 Training Error | p. 76 |
1.2 Hold-Out Testing | p. 77 |
1.3 Bootstrap and Jackknife | p. 78 |
1.4 Cross-Validation and Leave-One-Out | p. 79 |
2 [xi alpha]-Estimators | p. 81 |
2.1 Error Rate | p. 82 |
2.2 Recall, Precision, and F[subscript 1] | p. 89 |
3 Fast Leave-One-Out Estimation | p. 93 |
4 Experiments | p. 94 |
4.1 How Large are Bias and Variance of the [xi alpha]-Estimators? | p. 95 |
4.2 What is the Influence of the Training Set Size? | p. 99 |
4.3 How Large is the Efficiency Improvement for Exact Leave-One-Out? | p. 101 |
5 Summary and Conclusions | p. 102 |
Part Methods | |
6. Inductive Text Classification | p. 103 |
1 Learning Task | p. 104 |
2 Automatic Model and Parameter Selection | p. 105 |
2.1 Leave-One-Out Estimator of the PRBEP | p. 106 |
2.2 [xi alpha]-Estimator of the PRBEP | p. 106 |
2.3 Model-Selection Algorithm | p. 108 |
3 Experiments | p. 108 |
3.1 Word Weighting, Stemming and Stopword Removal | p. 108 |
3.2 Trading Off Training Error vs. Complexity | p. 111 |
3.3 Non-Linear Classification Rules | p. 113 |
3.4 Comparison with Conventional Methods | p. 113 |
4 Related Work | p. 116 |
5 Summary and Conclusions | p. 117 |
7. Transductive Text Classification | p. 119 |
1 Learning Task | p. 120 |
2 Transductive Support Vector Machines | p. 121 |
3 What Makes TSVMs well Suited for Text Classification? | p. 123 |
3.1 An Intuitive Example | p. 123 |
3.2 Transductive Learning of TCat-Concepts | p. 125 |
4 Experiments | p. 127 |
5 Constraints on the Transductive Hyperplane | p. 130 |
6 Relation to Other Approaches Using Unlabeled Data | p. 133 |
6.1 Probabilistic Approaches using EM | p. 133 |
6.2 Co-Training | p. 134 |
6.3 Other Work on Transduction | p. 139 |
7 Summary and Conclusions | p. 139 |
Part Algorithms | |
8. Training Inductive Support Vector Machines | p. 141 |
1 Problem and Approach | p. 142 |
2 General Decomposition Algorithm | p. 143 |
3 Selecting a Good Working Set | p. 145 |
3.1 Convergence | p. 145 |
3.2 How to Compute the Working Set | p. 146 |
4 Shrinking: Reducing the Number of Variables | p. 146 |
5 Efficient Implementation | p. 148 |
5.1 Termination Criteria | p. 148 |
5.2 Computing the Gradient and the Termination Criteria Efficiently | p. 149 |
5.3 What are the Computational Resources Needed in each Iteration? | p. 150 |
5.4 Caching Kernel Evaluations | p. 151 |
5.5 How to Solve the QP on the Working Set | p. 152 |
6 Related Work | p. 152 |
7 Experiments | p. 154 |
7.1 Training Times for Reuters, WebKB, and Ohsumed | p. 154 |
7.2 How does Training Time Scale with the Number of Training Examples? | p. 154 |
7.3 What is the Influence of the Working-Set-Selection Strategy? | p. 160 |
7.4 What is the Influence of Caching? | p. 161 |
7.5 What is the Influence of Shrinking? | p. 161 |
8 Summary and Conclusions | p. 162 |
9. Training Transductive Support Vector Machines | p. 163 |
1 Problem and Approach | p. 163 |
2 The TSVM Algorithm | p. 165 |
3 Analysis of the Algorithm | p. 166 |
3.1 How does the Algorithm work? | p. 166 |
3.2 Convergence | p. 168 |
4 Experiments | p. 169 |
4.1 Does the Algorithm Effectively Maximize Margin? | p. 169 |
4.2 Training Times for Reuters, WebKB, and Ohsumed | p. 170 |
4.3 How does Training Time Scale with the Number of Training Examples? | p. 170 |
4.4 How does Training Time Scale with the Number of Test Examples? | p. 172 |
5 Related Work | p. 172 |
6 Summary and Conclusions | p. 174 |
10. Conclusions | p. 175 |
1 Open Question | p. 177 |
Bibliography | p. 180 |
Appendices | p. 197 |
SVM-Light Commands and Options | p. 197 |
Index | p. 203 |