Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010344150 | QC20.7.D55 G57 2015 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Ever-greater computing technologies have given rise to an exponentially growing volume of data. Today massive data sets (with potentially thousands of variables) play an important role in almost every branch of modern human activity, including networks, finance, and genetics. However, analyzing such data has presented a challenge for statisticians and data analysts and has required the development of new statistical methods capable of separating the signal from the noise.
Introduction to High-Dimensional Statistics is a concise guide to state-of-the-art models, techniques, and approaches for handling high-dimensional data. The book is intended to expose the reader to the key concepts and ideas in the most simple settings possible while avoiding unnecessary technicalities.
Offering a succinct presentation of the mathematical foundations of high-dimensional statistics, this highly accessible text:
Describes the challenges related to the analysis of high-dimensional data Covers cutting-edge statistical methods including model selection, sparsity and the lasso, aggregation, and learning theory Provides detailed exercises at the end of every chapter with collaborative solutions on a wikisite Illustrates concepts with simple but clear practical examplesIntroduction to High-Dimensional Statistics is suitable for graduate students and researchers interested in discovering modern statistics for massive data. It can be used as a graduate text or for self-study.
Author Notes
Christophe Giraud was a student of the École Normale Supérieure de Paris, and he received a Ph.D in probability theory from the University Paris 6. He was assistant professor at the University of Nice from 2002 to 2008. He has been associate professor at the École Polytechnique since 2008 and professor at Paris Sud University (Orsay) since 2012. His current research focuses mainly on the statistical theory of high-dimensional data analysis and its applications to life sciences.
Table of Contents
Preface | p. xiii |
Acknowledgments | p. xv |
1 Introduction | p. 1 |
1.1 High-Dimensional Data | p. 1 |
1.2 Curse of Dimensionality | p. 3 |
1.2.1 Lost in the Immensity of High-Dimensional Spaces | p. 3 |
1.2.2 Fluctuations Cumulate | p. 6 |
1.2.3 An Accumulation of Rare Events May Not Be Rare | p. 10 |
1.2.4 Computational Complexity | p. 13 |
1.3 High-Dimensional Statistics | p. 13 |
1.3.1 Circumventing the Curse of Dimensionality | p. 13 |
1.3.2 A Paradigm Shift | p. 16 |
1.3.3 Mathematics of High-Dimensional Statistics | p. 17 |
1.4 About This Book | p. 18 |
1.4.1 Statistics and Data Analysis | p. 18 |
1.4.2 Purpose of This Book | p. 19 |
1.4.3 Overview | p. 19 |
1.5 Discussion and References | p. 21 |
1.5.1 Take-Home Message | p. 21 |
1.5.2 References | p. 21 |
1.6 Exercises | p. 21 |
1.6.1 Strange Geometry of High-Dimensional Spaces | p. 21 |
1.6.2 Volume of a p-Dimensional Ball | p. 22 |
1.6.3 Tails of a Standard Gaussian Distribution | p. 22 |
1.6.4 Principal Component Analysis | p. 23 |
1.6.5 Basics of Linear Regression | p. 24 |
1.6.6 Concentration of Square Norm of Gaussian Random Variable | p. 24 |
2 Model Selection | p. 27 |
2.1 Statistical Setting | p. 27 |
2.2 To Select among a Collection of Models | p. 30 |
2.2.1 Models and Oracle | p. 30 |
2.2.2 Model Selection Procedures | p. 33 |
2.3 Risk Bound for Model Selection | p. 37 |
2.3.1 Oracle Risk Bound | p. 37 |
2.4 Optimality | p. 41 |
2.4.1 Minimax Optimality | p. 41 |
2.4.2 Frontier of Estimation in High Dimensions | p. 45 |
2.4.3 Minimal Penalties | p. 45 |
2.5 Computational Issues | p. 46 |
2.6 Illustration | p. 47 |
2.7 An Alternative Point of View on Model Selection | p. 49 |
2.8 Discussion and References | p. 50 |
2.8.1 Take-Home Message | p. 50 |
2.8.2 References | p. 50 |
2.9 Exercises | p. 51 |
2.9.1 Orthogonal Design | p. 51 |
2.9.2 Risk Bounds for the Different Sparsity Settings | p. 53 |
2.9.3 Collections of Nested Models | p. 54 |
2.9.4 Segmentation with Dynamic Programming | p. 55 |
2.9.5 Goldenshluger-Lepski Method | p. 56 |
2.9.6 Minimax Lower Bounds | p. 58 |
3 Aggregation of Estimators | p. 61 |
3.1 Introduction | p. 61 |
3.2 Gibbs Mixing of Estimators | p. 61 |
3.3 Oracle Risk Bound | p. 62 |
3.4 Numerical Approximation by Metropohs-Hastings | p. 65 |
3.5 Numerical Illustration | p. 68 |
3.6 Discussion and References | p. 69 |
3.6.1 Take-Home Message | p. 69 |
3.6.2 References | p. 69 |
3.7 Exercises | p. 69 |
3.7.1 Gibbs Distribution | p. 69 |
3.7.2 Orthonormal Setting with Power Law Prior | p. 70 |
3.7.3 Group-Sparse Setting | p. 70 |
3.7.4 Gain of Combining | p. 70 |
3.7.5 Online Aggregation | p. 71 |
4 Convex Criteria | p. 73 |
4.1 Reminder on Convex Multivariate Functions | p. 73 |
4.1.1 Subdifferentials | p. 73 |
4.1.2 Two Useful Properties | p. 74 |
4.2 Lasso Estimator | p. 74 |
4.2.1 Geometric Insights | p. 76 |
4.2.2 Analytic Insights | p. 76 |
4.2.3 Oracle Risk Bound | p. 79 |
4.2.4 Computing the Lasso Estimator | p. 82 |
4.2.5 Removing the Bias of the Lasso Estimator | p. 86 |
4.3 Convex Criteria for Various Sparsity Patterns | p. 87 |
4.3.1 Group-Lasso (Group Sparsity) | p. 87 |
4.3.2 Sparse-Group Lasso (Sparse-Group Sparsity) | p. 90 |
4.3.3 Fused-Lasso (Variation Sparsity) | p. 91 |
4.4 Discussion and References | p. 91 |
4.4.1 Take-Home Message | p. 91 |
4.4.2 References | p. 91 |
4.5 Exercises | p. 92 |
4.5.1 When Is the Lasso Solution Unique? | p. 92 |
4.5.2 Support Recovery via the Witness Approach | p. 93 |
4.5.3 Lower Bound on the Compatibility Constant | p. 94 |
4.5.4 On the Group-Lasso | p. 94 |
4.5.5 Dantzig Selector | p. 96 |
4.5.6 Projection on the l 1 -Ball | p. 97 |
4.5.7 Ridge and Elastic-Net | p. 98 |
5 Estimator Selection | p. 101 |
5.1 Estimator Selection | p. 102 |
5.2 Cross-Validation Techniques | p. 103 |
5.3 Complexity Selection Techniques | p. 105 |
5.3.1 Coordinate-Sparse Regression | p. 106 |
5.3.2 Group-Sparse Regression | p. 107 |
5.3.3 Multiple Structures | p. 108 |
5.4 Scaled-Invariant Criteria | p. 308 |
5.5 References and Discussion | p. 113 |
5.5.1 Take-Home Message | p. 113 |
5.5.2 References | p. 115 |
5.6 Exercises | p. 116 |
5.6.1 Expected V-Fold CV l 2 -Risk | p. 116 |
5.6.2 Proof of Corollary 5.5 | p. 117 |
5.6.3 Some Properties of Penalty (5.4) | p. 117 |
5.6.4 Selecting the Number of Steps for the Forward Algorithm | p. 119 |
6 Multivariate Regression | p. 121 |
6.1 Statistical Setting | p. 121 |
6.2 A Reminder on Singular Values | p. 122 |
6.3 Low-Rank Estimation | p. 123 |
6.3.1 If We Knew the Rank of A* | p. 123 |
6.3.2 When the Rank of A* Is Unknown | p. 126 |
6.4 Low Rank and Sparsity | p. 129 |
6.4.1 Row-Sparse matrices | p. 129 |
6.4.2 Criterion for Row-Sparse and Low-Rank Matrices | p. 130 |
6.4.3 Convex Criterion for Low Rank Matrices | p. 134 |
6.4.4 Convex Criterion for Sparse and Low-Rank Matrices | p. 137 |
6.5 Discussion and References | p. 137 |
6.5.1 Take-Home Message | p. 137 |
6.5.2 References | p. 138 |
6.6 Exercises | p. 138 |
6.6.1 Hard-Thresholding of the Singular Values | p. 138 |
6.6.2 Exact Rank Recovery | p. 138 |
6.6.3 Rank Selection with Unknown Variance | p. 139 |
7 Graphical Models | p. 141 |
7.1 Reminder on Conditional Independence | p. 142 |
7.2 Graphical Models | p. 142 |
7.2.1 Directed Acyclic Graphical Models | p. 142 |
7.2.2 Nondirected Models | p. 144 |
7.3 Gaussian Graphical Models (GGM) | p. 147 |
7.3.1 Connection with the Precision Matrix and the Linear Regression | p. 147 |
7.3.2 Estimating g by Multiple Testing | p. 148 |
7.3.3 Sparse Estimation of the Precision Matrix | p. 149 |
7.3.4 Estimation of g by Regression | p. 150 |
7.4 Practical Issues | p. 154 |
7.5 Discussion and References | p. 155 |
7.5.1 Take-Home Message | p. 155 |
7.5.2 References | p. 155 |
7.6 Exercises | p. 156 |
7.6.1 Factorization in Directed Models | p. 156 |
7.6.2 Moralization of a Directed Graph | p. 156 |
7.6.3 Convexity of -log(det(K)) | p. 157 |
7.6.4 Block Gradient Descent with the l 1 /l 2 Penalty | p. 157 |
7.6.5 Gaussian Graphical Models with Hidden Variables | p. 157 |
7.6.6 Dantzig Estimation of Sparse Gaussian Graphical Models | p. 158 |
7.6.7 Gaussian Copula Graphical Models | p. 160 |
7.6.8 Restricted Isometry Constant for Gaussian Matrices | p. 162 |
8 Multiple Testing | p. 165 |
8.1 An Introductory Example | p. 165 |
8.1.1 Differential Expression of a Single Gene | p. 165 |
8.1.2 Differential Expression of Multiple Genes | p. 166 |
8.2 Statistical Setting | p. 167 |
8.2.1 p-Values | p. 167 |
8.2.2 Multiple Testing Setting | p. 168 |
8.2.3 Bonferroni Correction | p. 169 |
8.3 Controlling the False Discovery Rate | p. 169 |
8.3.1 Heuristics | p. 169 |
8.3.2 Step-Up Procedures | p. 170 |
8.3.3 FDR Control under the WPRDS Property | p. 173 |
8.4 Illustration | p. 175 |
8.5 Discussion and References | p. 177 |
8.5.1 Take-Home Message | p. 177 |
8.5.2 References | p. 177 |
8.6 Exercises | p. 177 |
8.6.1 FDR versus FWER | p. 177 |
8.6.2 WPRDS Property | p. 178 |
8.6.3 Positively Correlated Normal Test Statistics | p. 178 |
9 Supervised Classification | p. 181 |
9.1 Statistical Modeling | p. 181 |
9.1.1 Bayes Classifier | p. 181 |
9.1.2 Parametric Modeling | p. 182 |
9.1.3 Semi-Parametric Modeling | p. 183 |
9.1.4 Nonparametric Modeling | p. 185 |
9.2 Empirical Risk Minimization | p. 185 |
9.2.1 Misclassification Probability of the Empirical Risk Minimizer | p. 186 |
9.2.2 Vapnik-Chervonenkis Dimension | p. 190 |
9.2.3 Dictionary Selection | p. 193 |
9.3 From Theoretical to Practical Classifiers | p. 195 |
9.3.1 Empirical Risk Convexification | p. 195 |
9.3.2 Statistical Properties | p. 198 |
9.3.3 Support Vector Machines | p. 202 |
9.3.4 AdaBoost | p. 206 |
9.3.5 Classifier Selection | p. 207 |
9.4 Discussion and References | p. 208 |
9.4.1 Take-Home Message | p. 208 |
9.4.2 References | p. 208 |
9.5 Exercises | p. 208 |
9.5.1 Linear Discriminant Analysis | p. 208 |
9.5.2 VC Dimension of Linear Classifiers in Rd | p. 209 |
9.5.3 Linear Classifiers with Margin Constraints | p. 209 |
9.5.4 Spectral Kernel | p. 210 |
9.5.5 Computation of the SVM Classifier | p. 210 |
9.5.6 Kernel Principal Component Analysis (KPCA) | p. 211 |
Appendix A Gaussian Distribution | p. 213 |
A.1 Gaussian Random Vectors | p. 213 |
A.2 Chi-Square Distribution | p. 214 |
A.3 Gaussian Conditioning | p. 215 |
Appendix B Probabilistic Inequalities | p. 217 |
B.1 Basic Inequalities | p. 217 |
B.2 Concentration Inequalities | p. 218 |
B.2.1 McDiarmid Inequality | p. 218 |
B.2.2 Gaussian Concentration Inequality | p. 221 |
B.3 Symmetrization and Contraction Lemmas | p. 223 |
B.3.1 Symmetrization Lemma | p. 223 |
B.3.2 Contraction Principle | p. 224 |
B.4 Birge's Inequality | p. 227 |
Appendix C Linear Aigebra | p. 229 |
C.1 Singular Value Decomposition (SVD) | p. 229 |
C.2 Moore-Penrose Pseudo-Inverse | p. 230 |
C.3 Matrix Norms | p. 231 |
C.4 Matrix Analysis | p. 232 |
Appendix D Subdifferentials of Convex Functions | p. 235 |
D.1 Subdifferentials and Subgradients | p. 235 |
D.2 Examples of Subdifferentials | p. 237 |
Appendix E Reproducing Kernel Hilbert Spaces | p. 239 |
Notations | p. 243 |
Bibliography | p. 245 |
Index | p. 253 |