Semisupervised learning in computational linguistics

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Personal Author:

Abney, Steven

Series:

Computer science and data analysis series

Publication Information:

Boca Raton, FL : Chapman & Hall/CRC, 2008

ISBN:

9781584885597

Subject Term:

Computational linguistics -- Study and teaching (Higher)

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010161065	P98.3 A26 2008	Open Access Book	Book	Searching... Unknown

The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.

The book presents a brief history of semisupervised learning and its place in the spectrum of learning methods before moving on to discuss well-known natural language processing methods, such as self-training and co-training. It then centers on machine learning techniques, including the boundary-oriented methods of perceptrons, boosting, support vector machines (SVMs), and the null-category noise model. In addition, the book covers clustering, the expectation-maximization (EM) algorithm, related generative methods, and agreement methods. It concludes with the graph-based method of label propagation as well as a detailed discussion of spectral methods.

Taking an intuitive approach to the material, this lucid book facilitates the application of semisupervised learning methods to natural language processing and provides the framework and motivation for a more systematic study of machine learning.

Author Notes

Abney, Steven

1 Introduction	p. 1
1.1 A brief history	p. 1
1.1.1 Probabilistic methods in computational linguistics	p. 1
1.1.2 Supervised and unsupervised training	p. 2
1.1.3 Semisupervised learning	p. 3
1.2 Semisupervised learning	p. 4
1.2.1 Major varieties of learning problem	p. 4
1.2.2 Motivation	p. 6
1.2.3 Evaluation	p. 7
1.2.4 Active learning	p. 8
1.3 Organization and assumptions	p. 8
1.3.1 Leading ideas	p. 8
1.3.2 Mathematical background	p. 10
1.3.3 Notation	p. 11
2 Self-training and Co-training	p. 13
2.1 Classification	p. 13
2.1.1 The standard setting	p. 13
2.1.2 Features and rules	p. 14
2.1.3 Decision lists	p. 16
2.2 Self-training	p. 18
2.2.1 The algorithm	p. 19
2.2.2 Parameters and variants	p. 20
2.2.3 Evaluation	p. 23
2.2.4 Symmetry of features and instances	p. 25
2.2.5 Related algorithms	p. 27
2.3 Co-Training	p. 28
3 Applications of Self-Training and Co-Training	p. 31
3.1 Part-of-speech tagging	p. 31
3.2 Information extraction	p. 33
3.3 Parsing	p. 35
3.4 Word senses	p. 36
3.4.1 WordNet	p. 36
3.4.2 Word-sense disambiguation	p. 38
3.4.3 Taxonomic inference	p. 40
4 Classification	p. 43
4.1 Two simple classifiers	p. 43
4.1.1 Naive Bayes	p. 43
4.1.2 k-nearest-neighbor classifier	p. 45
4.2 Abstract setting	p. 48
4.2.1 Function approximation	p. 48
4.2.2 Defining succes	p. 50
4.2.3 Fit and simplicity	p. 52
4.3 Evaluating detectors and classifiers that abstain	p. 53
4.3.1 Confidence-rated classifiers	p. 53
4.3.2 Measures for detection	p. 54
4.3.3 Idealized performance curves	p. 57
4.3.4 The multiclass case	p. 59
4.4 Binary classifiers and ECOC	p. 62
5 Mathematics for Boundary-Oriented Methods	p. 67
5.1 Linear separators	p. 67
5.1.1 Representing a hyperplane	p. 67
5.1.2 Eliminating the threshold	p. 69
5.1.3 The point-normal form	p. 70
5.1.4 Naive Bayes decision boundary	p. 72
5.2 The gradient	p. 74
5.2.1 Graphs and domains	p. 74
5.2.2 Convexity	p. 76
5.2.3 Differentiation of vector and matrix expressions	p. 79
5.2.4 An example: linear regression	p. 81
5.3 Constrained optimization	p. 83
5.3.1 Optimization	p. 83
5.3.2 Equality constraints	p. 84
5.3.3 Inequality constraints	p. 87
5.3.4 The Wolfe dual	p. 91
6 Boundary-Oriented Methods	p. 95
6.1 The perceptron	p. 97
6.1.1 The algorithm	p. 97
6.1.2 An example	p. 99
6.1.3 Convergence	p. 100
6.1.4 The perceptron algorithm as gradient descent	p. 101
6.2 Game self-teaching	p. 103
6.3 Boosting	p. 105
6.3.1 Abstention	p. 110
6.3.2 Semisupervised boosting	p. 111
6.3.3 Co-boosting	p. 113
6.4 Support Vector Machines (SVMs)	p. 114
6.4.1 The margin	p. 114
6.4.2 Maximizing the margin	p. 116
6.4.3 The nonseparable case	p. 119
6.4.4 Slack in the separable case	p. 121
6.4.5 Multiple slack points	p. 123
6.4.6 Transductive SVMs	p. 125
6.4.7 Training a transductive SVM	p. 127
6.5 Null-category noise model	p. 129
7 Clustering	p. 131
7.1 Cluster and label	p. 131
7.2 Clustering concepts	p. 132
7.2.1 Objective	p. 132
7.2.2 Distance and similarity	p. 133
7.2.3 Graphs	p. 136
7.3 Hierarchical clustering	p. 137
7.4 Self-training revisited	p. 139
7.4.1 k-means clustering	p. 139
7.4.2 Pseudo relevance feedback	p. 140
7.5 Graph mincut	p. 143
7.6 Label propagation	p. 146
7.6.1 Clustering by propagation	p. 146
7.6.2 Self-training as propagation	p. 147
7.6.3 Co-training as propagation	p. 150
7.7 Bibliographic notes	p. 152
8 Generative Models	p. 153
8.1 Gaussian mixtures	p. 153
8.1.1 Definition and geometric interpretation	p. 153
8.1.2 The linear discriminant decision boundary	p. 156
8.1.3 Decision-directed approximation	p. 159
8.1.4 McLachlan's algorithm	p. 162
8.2 The EM algorithm	p. 163
8.2.1 Maximizing likelihood	p. 163
8.2.2 Relative frequency estimation	p. 164
8.2.3 Divergence	p. 166
8.2.4 The EM algorithm	p. 169
9 Agreement Constraints	p. 175
9.1 Co-training	p. 175
9.1.1 The conditional independence assumption	p. 176
9.1.2 The power of conditional independence	p. 178
9.2 Agreement-based self-teaching	p. 182
9.3 Random fields	p. 184
9.3.1 Applied to self-training and co-training	p. 184
9.3.2 Gibbs sampling	p. 186
9.3.3 Markov chains and random walks	p. 187
9.4 Bibliographic notes	p. 192
10 Propagation Methods	p. 193
10.1 Label propagation	p. 194
10.2 Random walks	p. 196
10.3 Harmonic functions	p. 198
10.4 Fluids	p. 203
10.4.1 Flow	p. 203
10.4.2 Pressure	p. 205
10.4.3 Conservation of energy	p. 209
10.4.4 Thomson's principle	p. 210
10.5 Computing the solution	p. 213
10.6 Graph mincuts revisited	p. 215
10.7 Bibliographic notes	p. 220
11 Mathematics for Spectral Methods	p. 221
11.1 Some basic concepts	p. 221
11.1.1 The norm of a vector	p. 221
11.1.2 Matrices as linear operators	p. 222
11.1.3 The column space	p. 222
11.2 Eigenvalues and eigenvectors	p. 224
11.2.1 Definition of eigenvalues and eigenvectors	p. 224
11.2.2 Diagonalization	p. 225
11.2.3 Orthogonal diagonalization	p. 226
11.3 Eigenvalues and the scaling effects of a matrix	p. 227
11.3.1 Matrix norms	p. 227
11.3.2 The Rayleigh quotient	p. 228
11.3.3 The 2 x 2 case	p. 230
11.3.4 The general case	p. 232
11.3.5 The Courant-Fischer minimax theorem	p. 234
11.4 Bibliographic notes	p. 236
12 Spectral Methods	p. 237
12.1 Simple harmonic motion	p. 237
12.1.1 Harmonics	p. 237
12.1.2 Mixtures of harmonics	p. 239
12.1.3 An oscillating particle	p. 241
12.1.4 A vibrating string	p. 243
12.2 Spectra of matrices and graphs	p. 251
12.2.1 The spectrum of a matrix	p. 252
12.2.2 Relating matrices and graphs	p. 253
12.2.3 The Laplacian matrix and graph spectrum	p. 256
12.3 Spectral clustering	p. 257
12.3.1 The second smallest eigenvector of the Laplacian	p. 257
12.3.2 The cut size and the Laplacian	p. 259
12.3.3 Approximating cut size	p. 260
12.3.4 Minimizing cut size	p. 262
12.3.5 Ratiocut	p. 263
12.4 Spectral methods for semisupervised learning	p. 265
12.4.1 Harmonics and harmonic functions	p. 265
12.4.2 Eigenvalues and energy	p. 267
12.4.3 The Laplacian and random fields	p. 268
12.4.4 Harmonic functions and the Laplacian	p. 270
12.4.5 Using the Laplacian for regularization	p. 272
12.4.6 Transduction to induction	p. 274
12.5 Bibliographic notes	p. 275
Bibliography	p. 277
Index	p. 301

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents