Cover image for Learning to classify text using support vector machines
Title:
Learning to classify text using support vector machines
Personal Author:
Series:
Kluwer international series in engineering and computer science ; 668
Publication Information:
Boston, MA : Kluwer Academic Publishers, 2002
ISBN:
9780792376798

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000004583815 QA76.9.T48 J62 2002 Open Access Book Book
Searching...

On Order

Summary

Summary

Based on ideas from Support Vector Machines (SVMs), Learning To Classify Text Using Support Vector Machines presents a new approach to generating text classifiers from examples. The approach combines high performance and efficiency with theoretical understanding and improved robustness. In particular, it is highly effective without greedy heuristic components. The SVM approach is computationally efficient in training and classification, and it comes with a learning theory that can guide real-world applications.

Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning.


Table of Contents

Prof. Tom Mitchell and Prof. Katharina Morik
Forewordp. xi
Prefacep. xiii
Acknowledgmentsp. xv
Notationp. xvii
1. Introductionp. 1
1 Challengesp. 2
2 Goalsp. 3
3 Overview and Structure of the Argumentp. 4
3.1 Theoryp. 4
3.2 Methodsp. 5
3.3 Algorithmsp. 6
4 Summaryp. 6
2. Text Classificationp. 7
1 Learning Taskp. 7
1.1 Binary Settingp. 8
1.2 Multi-Class Settingp. 9
1.3 Multi-Label Settingp. 10
2 Representing Textp. 12
2.1 Word Levelp. 13
2.2 Sub-Word Levelp. 15
2.3 Multi-Word Levelp. 15
2.4 Semantic Levelp. 16
3 Feature Selectionp. 16
3.1 Feature Subset Selectionp. 17
3.2 Feature Constructionp. 19
4 Term Weightingp. 20
5 Conventional Learning Methodsp. 22
5.1 Naive Bayes Classifierp. 22
5.2 Rocchio Algorithmp. 24
5.3 [kappa]-Nearest Neighborsp. 25
5.4 Decision Tree Classifierp. 25
5.5 Other Methodsp. 26
6 Performance Measuresp. 27
6.1 Error Rate and Asymmetric Costp. 28
6.2 Precision and Recallp. 29
6.3 Precision/Recall Breakeven Point and F[subscript [beta]-Measurep. 30
6.4 Micro- and Macro-Averagingp. 30
7 Experimental Setupp. 31
7.1 Test Collectionsp. 31
7.2 Design Choicesp. 32
3. Support Vector Machinesp. 35
1 Linear Hard-Margin SVMsp. 36
2 Soft-Margin SVMsp. 39
3 Non-Linear SVMsp. 41
4 Asymmetric Misclassification Costp. 43
5 Other Maximum-Margin Methodsp. 43
6 Further Work and Further Informationp. 44
Part Theory
4. A Statistical Learning Model of Text Classification for SVMsp. 45
1 Properties of Text-Classification Tasksp. 46
1.1 High-Dimensional Feature Spacep. 46
1.2 Sparse Document Vectorsp. 47
1.3 Heterogeneous Use of Termsp. 47
1.4 High Level of Redundancyp. 48
1.5 Frequency Distribution of Words and Zipf's Lawp. 49
2 A Discriminative Model of Text Classificationp. 51
2.1 Step 1: Bounding the Expected Error Based on the Marginp. 51
2.2 Step 2: Homogeneous TCat-Concepts as a Model of Text-Classification Tasksp. 53
2.3 Step 3: Learnability of TCat-Conceptsp. 59
3 Comparing the Theoretical Model with Experimental Resultsp. 64
4 Sensitivity Analysis: Difficult and Easy Learning Tasksp. 66
4.1 Influence of Occurrence Frequencyp. 66
4.2 Discriminative Power of Term Setsp. 68
4.3 Level of Redundancyp. 68
5 Noisy TCat-Conceptsp. 69
6 Limitations of the Model and Open Questionsp. 72
7 Related Workp. 72
8 Summary and Conclusionsp. 74
5. Efficient Performance Estimators for SVMsp. 75
1 Generic Performance Estimatorsp. 76
1.1 Training Errorp. 76
1.2 Hold-Out Testingp. 77
1.3 Bootstrap and Jackknifep. 78
1.4 Cross-Validation and Leave-One-Outp. 79
2 [xi alpha]-Estimatorsp. 81
2.1 Error Ratep. 82
2.2 Recall, Precision, and F[subscript 1]p. 89
3 Fast Leave-One-Out Estimationp. 93
4 Experimentsp. 94
4.1 How Large are Bias and Variance of the [xi alpha]-Estimators?p. 95
4.2 What is the Influence of the Training Set Size?p. 99
4.3 How Large is the Efficiency Improvement for Exact Leave-One-Out?p. 101
5 Summary and Conclusionsp. 102
Part Methods
6. Inductive Text Classificationp. 103
1 Learning Taskp. 104
2 Automatic Model and Parameter Selectionp. 105
2.1 Leave-One-Out Estimator of the PRBEPp. 106
2.2 [xi alpha]-Estimator of the PRBEPp. 106
2.3 Model-Selection Algorithmp. 108
3 Experimentsp. 108
3.1 Word Weighting, Stemming and Stopword Removalp. 108
3.2 Trading Off Training Error vs. Complexityp. 111
3.3 Non-Linear Classification Rulesp. 113
3.4 Comparison with Conventional Methodsp. 113
4 Related Workp. 116
5 Summary and Conclusionsp. 117
7. Transductive Text Classificationp. 119
1 Learning Taskp. 120
2 Transductive Support Vector Machinesp. 121
3 What Makes TSVMs well Suited for Text Classification?p. 123
3.1 An Intuitive Examplep. 123
3.2 Transductive Learning of TCat-Conceptsp. 125
4 Experimentsp. 127
5 Constraints on the Transductive Hyperplanep. 130
6 Relation to Other Approaches Using Unlabeled Datap. 133
6.1 Probabilistic Approaches using EMp. 133
6.2 Co-Trainingp. 134
6.3 Other Work on Transductionp. 139
7 Summary and Conclusionsp. 139
Part Algorithms
8. Training Inductive Support Vector Machinesp. 141
1 Problem and Approachp. 142
2 General Decomposition Algorithmp. 143
3 Selecting a Good Working Setp. 145
3.1 Convergencep. 145
3.2 How to Compute the Working Setp. 146
4 Shrinking: Reducing the Number of Variablesp. 146
5 Efficient Implementationp. 148
5.1 Termination Criteriap. 148
5.2 Computing the Gradient and the Termination Criteria Efficientlyp. 149
5.3 What are the Computational Resources Needed in each Iteration?p. 150
5.4 Caching Kernel Evaluationsp. 151
5.5 How to Solve the QP on the Working Setp. 152
6 Related Workp. 152
7 Experimentsp. 154
7.1 Training Times for Reuters, WebKB, and Ohsumedp. 154
7.2 How does Training Time Scale with the Number of Training Examples?p. 154
7.3 What is the Influence of the Working-Set-Selection Strategy?p. 160
7.4 What is the Influence of Caching?p. 161
7.5 What is the Influence of Shrinking?p. 161
8 Summary and Conclusionsp. 162
9. Training Transductive Support Vector Machinesp. 163
1 Problem and Approachp. 163
2 The TSVM Algorithmp. 165
3 Analysis of the Algorithmp. 166
3.1 How does the Algorithm work?p. 166
3.2 Convergencep. 168
4 Experimentsp. 169
4.1 Does the Algorithm Effectively Maximize Margin?p. 169
4.2 Training Times for Reuters, WebKB, and Ohsumedp. 170
4.3 How does Training Time Scale with the Number of Training Examples?p. 170
4.4 How does Training Time Scale with the Number of Test Examples?p. 172
5 Related Workp. 172
6 Summary and Conclusionsp. 174
10. Conclusionsp. 175
1 Open Questionp. 177
Bibliographyp. 180
Appendicesp. 197
SVM-Light Commands and Optionsp. 197
Indexp. 203