Cover image for Semisupervised learning in computational linguistics
Title:
Semisupervised learning in computational linguistics
Personal Author:
Series:
Computer science and data analysis series
Publication Information:
Boca Raton, FL : Chapman & Hall/CRC, 2008
ISBN:
9781584885597

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010161065 P98.3 A26 2008 Open Access Book Book
Searching...

On Order

Summary

Summary

The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.

The book presents a brief history of semisupervised learning and its place in the spectrum of learning methods before moving on to discuss well-known natural language processing methods, such as self-training and co-training. It then centers on machine learning techniques, including the boundary-oriented methods of perceptrons, boosting, support vector machines (SVMs), and the null-category noise model. In addition, the book covers clustering, the expectation-maximization (EM) algorithm, related generative methods, and agreement methods. It concludes with the graph-based method of label propagation as well as a detailed discussion of spectral methods.

Taking an intuitive approach to the material, this lucid book facilitates the application of semisupervised learning methods to natural language processing and provides the framework and motivation for a more systematic study of machine learning.


Author Notes

Abney, Steven


Table of Contents

1 Introductionp. 1
1.1 A brief historyp. 1
1.1.1 Probabilistic methods in computational linguisticsp. 1
1.1.2 Supervised and unsupervised trainingp. 2
1.1.3 Semisupervised learningp. 3
1.2 Semisupervised learningp. 4
1.2.1 Major varieties of learning problemp. 4
1.2.2 Motivationp. 6
1.2.3 Evaluationp. 7
1.2.4 Active learningp. 8
1.3 Organization and assumptionsp. 8
1.3.1 Leading ideasp. 8
1.3.2 Mathematical backgroundp. 10
1.3.3 Notationp. 11
2 Self-training and Co-trainingp. 13
2.1 Classificationp. 13
2.1.1 The standard settingp. 13
2.1.2 Features and rulesp. 14
2.1.3 Decision listsp. 16
2.2 Self-trainingp. 18
2.2.1 The algorithmp. 19
2.2.2 Parameters and variantsp. 20
2.2.3 Evaluationp. 23
2.2.4 Symmetry of features and instancesp. 25
2.2.5 Related algorithmsp. 27
2.3 Co-Trainingp. 28
3 Applications of Self-Training and Co-Trainingp. 31
3.1 Part-of-speech taggingp. 31
3.2 Information extractionp. 33
3.3 Parsingp. 35
3.4 Word sensesp. 36
3.4.1 WordNetp. 36
3.4.2 Word-sense disambiguationp. 38
3.4.3 Taxonomic inferencep. 40
4 Classificationp. 43
4.1 Two simple classifiersp. 43
4.1.1 Naive Bayesp. 43
4.1.2 k-nearest-neighbor classifierp. 45
4.2 Abstract settingp. 48
4.2.1 Function approximationp. 48
4.2.2 Defining succesp. 50
4.2.3 Fit and simplicityp. 52
4.3 Evaluating detectors and classifiers that abstainp. 53
4.3.1 Confidence-rated classifiersp. 53
4.3.2 Measures for detectionp. 54
4.3.3 Idealized performance curvesp. 57
4.3.4 The multiclass casep. 59
4.4 Binary classifiers and ECOCp. 62
5 Mathematics for Boundary-Oriented Methodsp. 67
5.1 Linear separatorsp. 67
5.1.1 Representing a hyperplanep. 67
5.1.2 Eliminating the thresholdp. 69
5.1.3 The point-normal formp. 70
5.1.4 Naive Bayes decision boundaryp. 72
5.2 The gradientp. 74
5.2.1 Graphs and domainsp. 74
5.2.2 Convexityp. 76
5.2.3 Differentiation of vector and matrix expressionsp. 79
5.2.4 An example: linear regressionp. 81
5.3 Constrained optimizationp. 83
5.3.1 Optimizationp. 83
5.3.2 Equality constraintsp. 84
5.3.3 Inequality constraintsp. 87
5.3.4 The Wolfe dualp. 91
6 Boundary-Oriented Methodsp. 95
6.1 The perceptronp. 97
6.1.1 The algorithmp. 97
6.1.2 An examplep. 99
6.1.3 Convergencep. 100
6.1.4 The perceptron algorithm as gradient descentp. 101
6.2 Game self-teachingp. 103
6.3 Boostingp. 105
6.3.1 Abstentionp. 110
6.3.2 Semisupervised boostingp. 111
6.3.3 Co-boostingp. 113
6.4 Support Vector Machines (SVMs)p. 114
6.4.1 The marginp. 114
6.4.2 Maximizing the marginp. 116
6.4.3 The nonseparable casep. 119
6.4.4 Slack in the separable casep. 121
6.4.5 Multiple slack pointsp. 123
6.4.6 Transductive SVMsp. 125
6.4.7 Training a transductive SVMp. 127
6.5 Null-category noise modelp. 129
7 Clusteringp. 131
7.1 Cluster and labelp. 131
7.2 Clustering conceptsp. 132
7.2.1 Objectivep. 132
7.2.2 Distance and similarityp. 133
7.2.3 Graphsp. 136
7.3 Hierarchical clusteringp. 137
7.4 Self-training revisitedp. 139
7.4.1 k-means clusteringp. 139
7.4.2 Pseudo relevance feedbackp. 140
7.5 Graph mincutp. 143
7.6 Label propagationp. 146
7.6.1 Clustering by propagationp. 146
7.6.2 Self-training as propagationp. 147
7.6.3 Co-training as propagationp. 150
7.7 Bibliographic notesp. 152
8 Generative Modelsp. 153
8.1 Gaussian mixturesp. 153
8.1.1 Definition and geometric interpretationp. 153
8.1.2 The linear discriminant decision boundaryp. 156
8.1.3 Decision-directed approximationp. 159
8.1.4 McLachlan's algorithmp. 162
8.2 The EM algorithmp. 163
8.2.1 Maximizing likelihoodp. 163
8.2.2 Relative frequency estimationp. 164
8.2.3 Divergencep. 166
8.2.4 The EM algorithmp. 169
9 Agreement Constraintsp. 175
9.1 Co-trainingp. 175
9.1.1 The conditional independence assumptionp. 176
9.1.2 The power of conditional independencep. 178
9.2 Agreement-based self-teachingp. 182
9.3 Random fieldsp. 184
9.3.1 Applied to self-training and co-trainingp. 184
9.3.2 Gibbs samplingp. 186
9.3.3 Markov chains and random walksp. 187
9.4 Bibliographic notesp. 192
10 Propagation Methodsp. 193
10.1 Label propagationp. 194
10.2 Random walksp. 196
10.3 Harmonic functionsp. 198
10.4 Fluidsp. 203
10.4.1 Flowp. 203
10.4.2 Pressurep. 205
10.4.3 Conservation of energyp. 209
10.4.4 Thomson's principlep. 210
10.5 Computing the solutionp. 213
10.6 Graph mincuts revisitedp. 215
10.7 Bibliographic notesp. 220
11 Mathematics for Spectral Methodsp. 221
11.1 Some basic conceptsp. 221
11.1.1 The norm of a vectorp. 221
11.1.2 Matrices as linear operatorsp. 222
11.1.3 The column spacep. 222
11.2 Eigenvalues and eigenvectorsp. 224
11.2.1 Definition of eigenvalues and eigenvectorsp. 224
11.2.2 Diagonalizationp. 225
11.2.3 Orthogonal diagonalizationp. 226
11.3 Eigenvalues and the scaling effects of a matrixp. 227
11.3.1 Matrix normsp. 227
11.3.2 The Rayleigh quotientp. 228
11.3.3 The 2 x 2 casep. 230
11.3.4 The general casep. 232
11.3.5 The Courant-Fischer minimax theoremp. 234
11.4 Bibliographic notesp. 236
12 Spectral Methodsp. 237
12.1 Simple harmonic motionp. 237
12.1.1 Harmonicsp. 237
12.1.2 Mixtures of harmonicsp. 239
12.1.3 An oscillating particlep. 241
12.1.4 A vibrating stringp. 243
12.2 Spectra of matrices and graphsp. 251
12.2.1 The spectrum of a matrixp. 252
12.2.2 Relating matrices and graphsp. 253
12.2.3 The Laplacian matrix and graph spectrump. 256
12.3 Spectral clusteringp. 257
12.3.1 The second smallest eigenvector of the Laplacianp. 257
12.3.2 The cut size and the Laplacianp. 259
12.3.3 Approximating cut sizep. 260
12.3.4 Minimizing cut sizep. 262
12.3.5 Ratiocutp. 263
12.4 Spectral methods for semisupervised learningp. 265
12.4.1 Harmonics and harmonic functionsp. 265
12.4.2 Eigenvalues and energyp. 267
12.4.3 The Laplacian and random fieldsp. 268
12.4.4 Harmonic functions and the Laplacianp. 270
12.4.5 Using the Laplacian for regularizationp. 272
12.4.6 Transduction to inductionp. 274
12.5 Bibliographic notesp. 275
Bibliographyp. 277
Indexp. 301