Cover image for Machine learning in non-stationary environments : introduction to covariate shift adaptation
Title:
Machine learning in non-stationary environments : introduction to covariate shift adaptation
Personal Author:
Series:
Adaptive computation and machine learning
Publication Information:
Cambridge, Mass. : MIT Press, c2012
Physical Description:
xiv, 261 p. : ill. ; 24 cm.
ISBN:
9780262017091
Subject Term:
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010301948 Q325.5 S84 2012 Open Access Book Book
Searching...

On Order

Summary

Summary

Theory, algorithms, and applications of machine learning techniques to overcome "covariate shift" non-stationarity.

As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity.

After reviewing the state-of-the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non-stationarity.


Author Notes

Masashi Sugiyama is Associate Professor in the Department of Computer Science at Tokyo Institute of Technology. Motoaki Kawanabe is a Postdoctoral Researcher in Intelligent Data Analysis at the Fraunhofer FIRST Institute, Berlin. In October 2011, he moved to Advanced Telecommunications Research Institute International (ATR) in Kyoto, Japan.


Table of Contents

Forewordp. xi
Prefacep. xiii
I Introduction
1 Introduction and Problem Formulationp. 3
1.1 Machine Learning under Covariate Shiftp. 3
1.2 Quick Tour of Covariate Shift Adaptationp. 5
1.3 Problem Formulationp. 7
1.3.1 Function Learning from Examplesp. 7
1.3.2 Loss Functionsp. 8
1.3.3 Generalization Errorp. 9
1.3.4 Covariate Shiftp. 9
1.3.5 Models for Function Learningp. 10
1.3.6 Specification of Modelsp. 13
1.4 Structure of This Bookp. 14
1.4.1 Part II: Learning under Covariate Shiftp. 14
1.4.2 Part III: Learning Causing Covariate Shiftp. 17
II Learning Under Covariate Shift
2 Function Approximationp. 21
2.1 Importance-Weighting Techniques for Covariate Shift Adaptationp. 22
2.1.1 Importance-Weighted ERMp. 22
2.1.2 Adaptive IWERMp. 23
2.1.3 Regularized IWERMp. 23
2.2 Examples of Importance-Weighted Regression Methodsp. 25
2.2.1 Squared Loss: Least-Squares Regressionp. 26
2.2.2 Absolute Loss: Least-Absolute Regressionp. 30
2.2.3 Huber Loss: Huber Regressionp. 31
2.2.4 Deadzone-Linear Loss: Support Vector Regressionp. 33
2.3 Examples of Importance-Weighted Classification Methodsp. 35
2.3.1 Squared Loss: Fisher Discriminant Analysisp. 36
2.3.2 Logistic Loss: Logistic Regression Classifierp. 38
2.3.3 Hinge Loss: Support Vector Machinep. 39
2.3.4 Exponential Loss: Boostingp. 40
2.4 Numerical Examplesp. 40
2.4.1 Regressionp. 40
2.4.2 Classificationp. 41
2.5 Summary and Discussionp. 45
3 Model Selectionp. 47
3.1 Importance-Weighted Akaike Information Criterionp. 47
3.2 Importance-Weighted Subspace Information Criterionp. 50
3.2.1 Input Dependence vs. Input Independence in Generalization Error Analysisp. 51
3.2.2 Approximately Correct Modelsp. 53
3.2.3 Input-Dependent Analysis of Generalization Errorp. 54
3.3 Importance-Weighted Cross-Validationp. 64
3.4 Numerical Examplesp. 66
3.4.1 Regressionp. 66
3.4.2 Classificationp. 69
3.5 Summary and Discussionp. 70
4 Importance Estimationp. 73
4.1 Kernel Density Estimationp. 73
4.2 Kernel Mean Matchingp. 75
4.3 Logistic Regressionp. 76
4.4 Kullback-Leibler Importance Estimation Procedurep. 78
4.4.1 Algorithmp. 78
4.4.2 Model Selection by Cross-Validationp. 81
4.4.3 Basis Function Designp. 82
4.5 Least-Squares Importance Fittingp. 83
4.5.1 Algorithmp. 83
4.5.2 Basis Function Design and Model Selectionp. 84
4.5.3 Regularization Path Trackingp. 85
4.6 Unconstrained Least-Squares Importance Fittingp. 87
4.6.1 Algorithmp. 87
4.6.2 Analytic Computation of Leave-One-Out Cross-Validationp. 88
4.7 Numerical Examplesp. 88
4.7.1 Settingp. 90
4.7.2 Importance Estimation by KLIEPp. 90
4.7.3 Covariate Shift Adaptation by IWLS and IWCVp. 92
4.8 Experimental Comparisonp. 94
4.9 Summaryp. 101
5 Direct Density-Ratio Estimation with Dimensionality Reductionp. 103
5.1 Density Difference in Hetero-Distributional Subspacep. 103
5.2 Characterization of Hetero-Distributional Subspacep. 104
5.3 Identifying Hetero-Distributional Subspacep. 106
5.3.1 Basic Ideap. 106
5.3.2 Fisher Discriminant Analysisp. 108
5.3.3 Local Fisher Discriminant Analysisp. 109
5.4 Using LFDA for Finding Hetero-Distributional Subspacep. 112
5.5 Density-Ratio Estimation in the Hetero-Distributional Subspacep. 113
5.6 Numerical Examplesp. 113
5.6.1 Illustrative Examplep. 113
5.6.2 Performance Comparison Using Artificial Data Setsp. 117
5.7 Summaryp. 121
6 Relation to Sample Selection Biasp. 125
6.1 Heckman's Sample Selection Modelp. 125
6.2 Distributional Change and Sample Selection Biasp. 129
6.3 The Two-Step Algorithmp. 131
6.4 Relation to Covariate Shift Approachp. 134
7 Applications of Covariate Shift Adaptationp. 137
7.1 Brain-Computer Interfacep. 137
7.1.1 Backgroundp. 137
7.1.2 Experimental Setupp. 138
7.1.3 Experimental Resultsp. 140
7.2 Speaker Identificationp. 142
7.2.1 Backgroundp. 142
7.2.2 Formulationp. 142
7.2.3 Experimental Resultsp. 144
7.3 Natural Language Processingp. 149
7.3.1 Formulationp. 149
7.3.2 Experimental Resultsp. 151
7.4 Perceived Age Prediction from Face Imagesp. 152
7.4.1 Backgroundp. 152
7.4.2 Formulationp. 153
7.4.3 Incorporating Characteristics of Human Age Perceptionp. 153
7.4.4 Experimental Resultsp. 155
7.5 Human Activity Recognition from Accelerometric Datap. 157
7.5.1 Backgroundp. 157
7.5.2 Importance-Weighted Least-Squares Probabilistic Classifierp. 157
7.5.3 Experimental Results.p. 160
7.6 Sample Reuse in Reinforcement Learningp. 165
7.6.1 Markov Decision Problemsp. 165
7.6.2 Policy Iterationp. 166
7.6.3 Value Function Approximationp. 167
7.6.4 Sample Reuse by Covariate Shift Adaptationp. 168
7.6.5 On-Policy vs. Off-Policyp. 169
7.6.6 Importance Weighting in Value Function Approximationp. 170
7.6.7 Automatic Selection of the Flattening Parameterp. 174
7.6.8 Sample Reuse Policy Iterationp. 175
7.6.9 Robot Control Experimentsp. 176
III Learning Causing Covariate Shift
8 Active Learningp. 183
8.1 Preliminariesp. 183
8.1.1 Setupp. 183
8.1.2 Decomposition of Generalization Errorp. 185
8.1.3 Basic Strategy of Active Learningp. 188
8.2 Population-Based Active Learning Methodsp. 188
8.2.1 Classical Method of Active Learning for Correct Modelsp. 189
8.2.2 Limitations of Classical Approach and Countermeasuresp. 190
8.2.3 Input-Independent Variance-Only Methodp. 191
8.2.4 Input-Dependent Variance-Only Methodp. 193
8.2.5 Input-Independent Bias-and-Variance Approachp. 195
8.3 Numerical Examples of Population-Based Active Learning Methodsp. 198
8.3.1 Setupp. 198
8.3.2 Accuracy of Generalization Error Estimationp. 200
8.3.3 Obtained Generalization Errorp. 202
8.4 Pool-Based Active Learning Methodsp. 204
8.4.1 Classical Active Learning Method for Correct Models and Its Limitationsp. 204
8.4.2 Input-Independent Variance-Only Methodp. 205
8.4.3 Input-Dependent Variance-Only Methodp. 206
8.4.4 Input-Independent Bias-and-Variance Approachp. 207
8.5 Numerical Examples of Pool-Based Active Learning Methodsp. 209
8.6 Summary and Discussionp. 212
9 Active Learning with Model Selectionp. 215
9.1 Direct Approach and the Active Learning/Model Selection Dilemmap. 215
9.2 Sequential Approachp. 216
9.3 Batch Approachp. 218
9.4 Ensemble Active Learningp. 219
9.5 Numerical Examplesp. 220
9.5.1 Settingp. 220
9.5.2 Analysis of Batch Approachp. 221
9.5.3 Analysis of Sequential Approachp. 222
9.5.4 Comparison of Obtained Generalization Errorp. 222
9.6 Summary and Discussionp. 223
10 Applications of Active Learningp. 225
10.1 Design of Efficient Exploration Strategies in Reinforcement Learningp. 225
10.1.1 Efficient Exploration with Active Learningp. 225
10.1.2 Reinforcement Learning Revisitedp. 226
10.1.3 Decomposition of Generalization Errorp. 228
10.1.4 Estimating Generalization Error for Active Learningp. 229
10.1.5 Designing Sampling Policiesp. 230
10.1.6 Active Learning in Policy Iterationp. 231
10.1.7 Robot Control Experimentsp. 232
10.2 Wafer Alignment in Semiconductor Exposure Apparatusp. 234
IV Conclusions
11 Conclusions and Future Prospectsp. 241
11.1 Conclusionsp. 241
11.2 Future Prospectsp. 242
Appendix: List of Symbols and Abbreviationsp. 243
Bibliographyp. 247
Indexp. 259