Machine learning in non-stationary environments : introduction to covariate shift adaptation

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Personal Author:

Sugiyama, Masashi, 1974-

Series:

Adaptive computation and machine learning

Publication Information:

Cambridge, Mass. : MIT Press, c2012

Physical Description:

xiv, 261 p. : ill. ; 24 cm.

ISBN:

9780262017091

Subject Term:

Machine learning

Added Author:

Kawanabe, Motoaki

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010301948	Q325.5 S84 2012	Open Access Book	Book	Searching... Unknown

Summary

Theory, algorithms, and applications of machine learning techniques to overcome "covariate shift" non-stationarity.

As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity.

After reviewing the state-of-the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non-stationarity.

Author Notes

Masashi Sugiyama is Associate Professor in the Department of Computer Science at Tokyo Institute of Technology. Motoaki Kawanabe is a Postdoctoral Researcher in Intelligent Data Analysis at the Fraunhofer FIRST Institute, Berlin. In October 2011, he moved to Advanced Telecommunications Research Institute International (ATR) in Kyoto, Japan.

Foreword	p. xi
Preface	p. xiii
I Introduction
1 Introduction and Problem Formulation	p. 3
1.1 Machine Learning under Covariate Shift	p. 3
1.2 Quick Tour of Covariate Shift Adaptation	p. 5
1.3 Problem Formulation	p. 7
1.3.1 Function Learning from Examples	p. 7
1.3.2 Loss Functions	p. 8
1.3.3 Generalization Error	p. 9
1.3.4 Covariate Shift	p. 9
1.3.5 Models for Function Learning	p. 10
1.3.6 Specification of Models	p. 13
1.4 Structure of This Book	p. 14
1.4.1 Part II: Learning under Covariate Shift	p. 14
1.4.2 Part III: Learning Causing Covariate Shift	p. 17
II Learning Under Covariate Shift
2 Function Approximation	p. 21
2.1 Importance-Weighting Techniques for Covariate Shift Adaptation	p. 22
2.1.1 Importance-Weighted ERM	p. 22
2.1.2 Adaptive IWERM	p. 23
2.1.3 Regularized IWERM	p. 23
2.2 Examples of Importance-Weighted Regression Methods	p. 25
2.2.1 Squared Loss: Least-Squares Regression	p. 26
2.2.2 Absolute Loss: Least-Absolute Regression	p. 30
2.2.3 Huber Loss: Huber Regression	p. 31
2.2.4 Deadzone-Linear Loss: Support Vector Regression	p. 33
2.3 Examples of Importance-Weighted Classification Methods	p. 35
2.3.1 Squared Loss: Fisher Discriminant Analysis	p. 36
2.3.2 Logistic Loss: Logistic Regression Classifier	p. 38
2.3.3 Hinge Loss: Support Vector Machine	p. 39
2.3.4 Exponential Loss: Boosting	p. 40
2.4 Numerical Examples	p. 40
2.4.1 Regression	p. 40
2.4.2 Classification	p. 41
2.5 Summary and Discussion	p. 45
3 Model Selection	p. 47
3.1 Importance-Weighted Akaike Information Criterion	p. 47
3.2 Importance-Weighted Subspace Information Criterion	p. 50
3.2.1 Input Dependence vs. Input Independence in Generalization Error Analysis	p. 51
3.2.2 Approximately Correct Models	p. 53
3.2.3 Input-Dependent Analysis of Generalization Error	p. 54
3.3 Importance-Weighted Cross-Validation	p. 64
3.4 Numerical Examples	p. 66
3.4.1 Regression	p. 66
3.4.2 Classification	p. 69
3.5 Summary and Discussion	p. 70
4 Importance Estimation	p. 73
4.1 Kernel Density Estimation	p. 73
4.2 Kernel Mean Matching	p. 75
4.3 Logistic Regression	p. 76
4.4 Kullback-Leibler Importance Estimation Procedure	p. 78
4.4.1 Algorithm	p. 78
4.4.2 Model Selection by Cross-Validation	p. 81
4.4.3 Basis Function Design	p. 82
4.5 Least-Squares Importance Fitting	p. 83
4.5.1 Algorithm	p. 83
4.5.2 Basis Function Design and Model Selection	p. 84
4.5.3 Regularization Path Tracking	p. 85
4.6 Unconstrained Least-Squares Importance Fitting	p. 87
4.6.1 Algorithm	p. 87
4.6.2 Analytic Computation of Leave-One-Out Cross-Validation	p. 88
4.7 Numerical Examples	p. 88
4.7.1 Setting	p. 90
4.7.2 Importance Estimation by KLIEP	p. 90
4.7.3 Covariate Shift Adaptation by IWLS and IWCV	p. 92
4.8 Experimental Comparison	p. 94
4.9 Summary	p. 101
5 Direct Density-Ratio Estimation with Dimensionality Reduction	p. 103
5.1 Density Difference in Hetero-Distributional Subspace	p. 103
5.2 Characterization of Hetero-Distributional Subspace	p. 104
5.3 Identifying Hetero-Distributional Subspace	p. 106
5.3.1 Basic Idea	p. 106
5.3.2 Fisher Discriminant Analysis	p. 108
5.3.3 Local Fisher Discriminant Analysis	p. 109
5.4 Using LFDA for Finding Hetero-Distributional Subspace	p. 112
5.5 Density-Ratio Estimation in the Hetero-Distributional Subspace	p. 113
5.6 Numerical Examples	p. 113
5.6.1 Illustrative Example	p. 113
5.6.2 Performance Comparison Using Artificial Data Sets	p. 117
5.7 Summary	p. 121
6 Relation to Sample Selection Bias	p. 125
6.1 Heckman's Sample Selection Model	p. 125
6.2 Distributional Change and Sample Selection Bias	p. 129
6.3 The Two-Step Algorithm	p. 131
6.4 Relation to Covariate Shift Approach	p. 134
7 Applications of Covariate Shift Adaptation	p. 137
7.1 Brain-Computer Interface	p. 137
7.1.1 Background	p. 137
7.1.2 Experimental Setup	p. 138
7.1.3 Experimental Results	p. 140
7.2 Speaker Identification	p. 142
7.2.1 Background	p. 142
7.2.2 Formulation	p. 142
7.2.3 Experimental Results	p. 144
7.3 Natural Language Processing	p. 149
7.3.1 Formulation	p. 149
7.3.2 Experimental Results	p. 151
7.4 Perceived Age Prediction from Face Images	p. 152
7.4.1 Background	p. 152
7.4.2 Formulation	p. 153
7.4.3 Incorporating Characteristics of Human Age Perception	p. 153
7.4.4 Experimental Results	p. 155
7.5 Human Activity Recognition from Accelerometric Data	p. 157
7.5.1 Background	p. 157
7.5.2 Importance-Weighted Least-Squares Probabilistic Classifier	p. 157
7.5.3 Experimental Results.	p. 160
7.6 Sample Reuse in Reinforcement Learning	p. 165
7.6.1 Markov Decision Problems	p. 165
7.6.2 Policy Iteration	p. 166
7.6.3 Value Function Approximation	p. 167
7.6.4 Sample Reuse by Covariate Shift Adaptation	p. 168
7.6.5 On-Policy vs. Off-Policy	p. 169
7.6.6 Importance Weighting in Value Function Approximation	p. 170
7.6.7 Automatic Selection of the Flattening Parameter	p. 174
7.6.8 Sample Reuse Policy Iteration	p. 175
7.6.9 Robot Control Experiments	p. 176
III Learning Causing Covariate Shift
8 Active Learning	p. 183
8.1 Preliminaries	p. 183
8.1.1 Setup	p. 183
8.1.2 Decomposition of Generalization Error	p. 185
8.1.3 Basic Strategy of Active Learning	p. 188
8.2 Population-Based Active Learning Methods	p. 188
8.2.1 Classical Method of Active Learning for Correct Models	p. 189
8.2.2 Limitations of Classical Approach and Countermeasures	p. 190
8.2.3 Input-Independent Variance-Only Method	p. 191
8.2.4 Input-Dependent Variance-Only Method	p. 193
8.2.5 Input-Independent Bias-and-Variance Approach	p. 195
8.3 Numerical Examples of Population-Based Active Learning Methods	p. 198
8.3.1 Setup	p. 198
8.3.2 Accuracy of Generalization Error Estimation	p. 200
8.3.3 Obtained Generalization Error	p. 202
8.4 Pool-Based Active Learning Methods	p. 204
8.4.1 Classical Active Learning Method for Correct Models and Its Limitations	p. 204
8.4.2 Input-Independent Variance-Only Method	p. 205
8.4.3 Input-Dependent Variance-Only Method	p. 206
8.4.4 Input-Independent Bias-and-Variance Approach	p. 207
8.5 Numerical Examples of Pool-Based Active Learning Methods	p. 209
8.6 Summary and Discussion	p. 212
9 Active Learning with Model Selection	p. 215
9.1 Direct Approach and the Active Learning/Model Selection Dilemma	p. 215
9.2 Sequential Approach	p. 216
9.3 Batch Approach	p. 218
9.4 Ensemble Active Learning	p. 219
9.5 Numerical Examples	p. 220
9.5.1 Setting	p. 220
9.5.2 Analysis of Batch Approach	p. 221
9.5.3 Analysis of Sequential Approach	p. 222
9.5.4 Comparison of Obtained Generalization Error	p. 222
9.6 Summary and Discussion	p. 223
10 Applications of Active Learning	p. 225
10.1 Design of Efficient Exploration Strategies in Reinforcement Learning	p. 225
10.1.1 Efficient Exploration with Active Learning	p. 225
10.1.2 Reinforcement Learning Revisited	p. 226
10.1.3 Decomposition of Generalization Error	p. 228
10.1.4 Estimating Generalization Error for Active Learning	p. 229
10.1.5 Designing Sampling Policies	p. 230
10.1.6 Active Learning in Policy Iteration	p. 231
10.1.7 Robot Control Experiments	p. 232
10.2 Wafer Alignment in Semiconductor Exposure Apparatus	p. 234
IV Conclusions
11 Conclusions and Future Prospects	p. 241
11.1 Conclusions	p. 241
11.2 Future Prospects	p. 242
Appendix: List of Symbols and Abbreviations	p. 243
Bibliography	p. 247
Index	p. 259

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents