![Cover image for Machine learning in non-stationary environments : introduction to covariate shift adaptation Cover image for Machine learning in non-stationary environments : introduction to covariate shift adaptation](/client/assets/5.0.0/ctx//client/images/no_image.png)
Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010301948 | Q325.5 S84 2012 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Theory, algorithms, and applications of machine learning techniques to overcome "covariate shift" non-stationarity.
As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity.
After reviewing the state-of-the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non-stationarity.
Author Notes
Masashi Sugiyama is Associate Professor in the Department of Computer Science at Tokyo Institute of Technology. Motoaki Kawanabe is a Postdoctoral Researcher in Intelligent Data Analysis at the Fraunhofer FIRST Institute, Berlin. In October 2011, he moved to Advanced Telecommunications Research Institute International (ATR) in Kyoto, Japan.
Table of Contents
Foreword | p. xi |
Preface | p. xiii |
I Introduction | |
1 Introduction and Problem Formulation | p. 3 |
1.1 Machine Learning under Covariate Shift | p. 3 |
1.2 Quick Tour of Covariate Shift Adaptation | p. 5 |
1.3 Problem Formulation | p. 7 |
1.3.1 Function Learning from Examples | p. 7 |
1.3.2 Loss Functions | p. 8 |
1.3.3 Generalization Error | p. 9 |
1.3.4 Covariate Shift | p. 9 |
1.3.5 Models for Function Learning | p. 10 |
1.3.6 Specification of Models | p. 13 |
1.4 Structure of This Book | p. 14 |
1.4.1 Part II: Learning under Covariate Shift | p. 14 |
1.4.2 Part III: Learning Causing Covariate Shift | p. 17 |
II Learning Under Covariate Shift | |
2 Function Approximation | p. 21 |
2.1 Importance-Weighting Techniques for Covariate Shift Adaptation | p. 22 |
2.1.1 Importance-Weighted ERM | p. 22 |
2.1.2 Adaptive IWERM | p. 23 |
2.1.3 Regularized IWERM | p. 23 |
2.2 Examples of Importance-Weighted Regression Methods | p. 25 |
2.2.1 Squared Loss: Least-Squares Regression | p. 26 |
2.2.2 Absolute Loss: Least-Absolute Regression | p. 30 |
2.2.3 Huber Loss: Huber Regression | p. 31 |
2.2.4 Deadzone-Linear Loss: Support Vector Regression | p. 33 |
2.3 Examples of Importance-Weighted Classification Methods | p. 35 |
2.3.1 Squared Loss: Fisher Discriminant Analysis | p. 36 |
2.3.2 Logistic Loss: Logistic Regression Classifier | p. 38 |
2.3.3 Hinge Loss: Support Vector Machine | p. 39 |
2.3.4 Exponential Loss: Boosting | p. 40 |
2.4 Numerical Examples | p. 40 |
2.4.1 Regression | p. 40 |
2.4.2 Classification | p. 41 |
2.5 Summary and Discussion | p. 45 |
3 Model Selection | p. 47 |
3.1 Importance-Weighted Akaike Information Criterion | p. 47 |
3.2 Importance-Weighted Subspace Information Criterion | p. 50 |
3.2.1 Input Dependence vs. Input Independence in Generalization Error Analysis | p. 51 |
3.2.2 Approximately Correct Models | p. 53 |
3.2.3 Input-Dependent Analysis of Generalization Error | p. 54 |
3.3 Importance-Weighted Cross-Validation | p. 64 |
3.4 Numerical Examples | p. 66 |
3.4.1 Regression | p. 66 |
3.4.2 Classification | p. 69 |
3.5 Summary and Discussion | p. 70 |
4 Importance Estimation | p. 73 |
4.1 Kernel Density Estimation | p. 73 |
4.2 Kernel Mean Matching | p. 75 |
4.3 Logistic Regression | p. 76 |
4.4 Kullback-Leibler Importance Estimation Procedure | p. 78 |
4.4.1 Algorithm | p. 78 |
4.4.2 Model Selection by Cross-Validation | p. 81 |
4.4.3 Basis Function Design | p. 82 |
4.5 Least-Squares Importance Fitting | p. 83 |
4.5.1 Algorithm | p. 83 |
4.5.2 Basis Function Design and Model Selection | p. 84 |
4.5.3 Regularization Path Tracking | p. 85 |
4.6 Unconstrained Least-Squares Importance Fitting | p. 87 |
4.6.1 Algorithm | p. 87 |
4.6.2 Analytic Computation of Leave-One-Out Cross-Validation | p. 88 |
4.7 Numerical Examples | p. 88 |
4.7.1 Setting | p. 90 |
4.7.2 Importance Estimation by KLIEP | p. 90 |
4.7.3 Covariate Shift Adaptation by IWLS and IWCV | p. 92 |
4.8 Experimental Comparison | p. 94 |
4.9 Summary | p. 101 |
5 Direct Density-Ratio Estimation with Dimensionality Reduction | p. 103 |
5.1 Density Difference in Hetero-Distributional Subspace | p. 103 |
5.2 Characterization of Hetero-Distributional Subspace | p. 104 |
5.3 Identifying Hetero-Distributional Subspace | p. 106 |
5.3.1 Basic Idea | p. 106 |
5.3.2 Fisher Discriminant Analysis | p. 108 |
5.3.3 Local Fisher Discriminant Analysis | p. 109 |
5.4 Using LFDA for Finding Hetero-Distributional Subspace | p. 112 |
5.5 Density-Ratio Estimation in the Hetero-Distributional Subspace | p. 113 |
5.6 Numerical Examples | p. 113 |
5.6.1 Illustrative Example | p. 113 |
5.6.2 Performance Comparison Using Artificial Data Sets | p. 117 |
5.7 Summary | p. 121 |
6 Relation to Sample Selection Bias | p. 125 |
6.1 Heckman's Sample Selection Model | p. 125 |
6.2 Distributional Change and Sample Selection Bias | p. 129 |
6.3 The Two-Step Algorithm | p. 131 |
6.4 Relation to Covariate Shift Approach | p. 134 |
7 Applications of Covariate Shift Adaptation | p. 137 |
7.1 Brain-Computer Interface | p. 137 |
7.1.1 Background | p. 137 |
7.1.2 Experimental Setup | p. 138 |
7.1.3 Experimental Results | p. 140 |
7.2 Speaker Identification | p. 142 |
7.2.1 Background | p. 142 |
7.2.2 Formulation | p. 142 |
7.2.3 Experimental Results | p. 144 |
7.3 Natural Language Processing | p. 149 |
7.3.1 Formulation | p. 149 |
7.3.2 Experimental Results | p. 151 |
7.4 Perceived Age Prediction from Face Images | p. 152 |
7.4.1 Background | p. 152 |
7.4.2 Formulation | p. 153 |
7.4.3 Incorporating Characteristics of Human Age Perception | p. 153 |
7.4.4 Experimental Results | p. 155 |
7.5 Human Activity Recognition from Accelerometric Data | p. 157 |
7.5.1 Background | p. 157 |
7.5.2 Importance-Weighted Least-Squares Probabilistic Classifier | p. 157 |
7.5.3 Experimental Results. | p. 160 |
7.6 Sample Reuse in Reinforcement Learning | p. 165 |
7.6.1 Markov Decision Problems | p. 165 |
7.6.2 Policy Iteration | p. 166 |
7.6.3 Value Function Approximation | p. 167 |
7.6.4 Sample Reuse by Covariate Shift Adaptation | p. 168 |
7.6.5 On-Policy vs. Off-Policy | p. 169 |
7.6.6 Importance Weighting in Value Function Approximation | p. 170 |
7.6.7 Automatic Selection of the Flattening Parameter | p. 174 |
7.6.8 Sample Reuse Policy Iteration | p. 175 |
7.6.9 Robot Control Experiments | p. 176 |
III Learning Causing Covariate Shift | |
8 Active Learning | p. 183 |
8.1 Preliminaries | p. 183 |
8.1.1 Setup | p. 183 |
8.1.2 Decomposition of Generalization Error | p. 185 |
8.1.3 Basic Strategy of Active Learning | p. 188 |
8.2 Population-Based Active Learning Methods | p. 188 |
8.2.1 Classical Method of Active Learning for Correct Models | p. 189 |
8.2.2 Limitations of Classical Approach and Countermeasures | p. 190 |
8.2.3 Input-Independent Variance-Only Method | p. 191 |
8.2.4 Input-Dependent Variance-Only Method | p. 193 |
8.2.5 Input-Independent Bias-and-Variance Approach | p. 195 |
8.3 Numerical Examples of Population-Based Active Learning Methods | p. 198 |
8.3.1 Setup | p. 198 |
8.3.2 Accuracy of Generalization Error Estimation | p. 200 |
8.3.3 Obtained Generalization Error | p. 202 |
8.4 Pool-Based Active Learning Methods | p. 204 |
8.4.1 Classical Active Learning Method for Correct Models and Its Limitations | p. 204 |
8.4.2 Input-Independent Variance-Only Method | p. 205 |
8.4.3 Input-Dependent Variance-Only Method | p. 206 |
8.4.4 Input-Independent Bias-and-Variance Approach | p. 207 |
8.5 Numerical Examples of Pool-Based Active Learning Methods | p. 209 |
8.6 Summary and Discussion | p. 212 |
9 Active Learning with Model Selection | p. 215 |
9.1 Direct Approach and the Active Learning/Model Selection Dilemma | p. 215 |
9.2 Sequential Approach | p. 216 |
9.3 Batch Approach | p. 218 |
9.4 Ensemble Active Learning | p. 219 |
9.5 Numerical Examples | p. 220 |
9.5.1 Setting | p. 220 |
9.5.2 Analysis of Batch Approach | p. 221 |
9.5.3 Analysis of Sequential Approach | p. 222 |
9.5.4 Comparison of Obtained Generalization Error | p. 222 |
9.6 Summary and Discussion | p. 223 |
10 Applications of Active Learning | p. 225 |
10.1 Design of Efficient Exploration Strategies in Reinforcement Learning | p. 225 |
10.1.1 Efficient Exploration with Active Learning | p. 225 |
10.1.2 Reinforcement Learning Revisited | p. 226 |
10.1.3 Decomposition of Generalization Error | p. 228 |
10.1.4 Estimating Generalization Error for Active Learning | p. 229 |
10.1.5 Designing Sampling Policies | p. 230 |
10.1.6 Active Learning in Policy Iteration | p. 231 |
10.1.7 Robot Control Experiments | p. 232 |
10.2 Wafer Alignment in Semiconductor Exposure Apparatus | p. 234 |
IV Conclusions | |
11 Conclusions and Future Prospects | p. 241 |
11.1 Conclusions | p. 241 |
11.2 Future Prospects | p. 242 |
Appendix: List of Symbols and Abbreviations | p. 243 |
Bibliography | p. 247 |
Index | p. 259 |