Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010369665 | Q325.5 A76 2019 | Open Access Book | Book | Searching... |
Searching... | 33000000003190 | Q325.5 A76 2019 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
A Computational Approach to Statistical Learning gives a novel introduction to predictive modeling by focusing on the algorithmic and numeric motivations behind popular statistical methods. The text contains annotated code to over 80 original reference functions. These functions provide minimal working implementations of common statistical learning algorithms. Every chapter concludes with a fully worked out application that illustrates predictive modeling tasks using a real-world dataset.
The text begins with a detailed analysis of linear models and ordinary least squares. Subsequent chapters explore extensions such as ridge regression, generalized linear models, and additive models. The second half focuses on the use of general-purpose algorithms for convex optimization and their application to tasks in statistical learning. Models covered include the elastic net, dense neural networks, convolutional neural networks (CNNs), and spectral clustering. A unifying theme throughout the text is the use of optimization theory in the description of predictive models, with a particular focus on the singular value decomposition (SVD). Through this theme, the computational approach motivates and clarifies the relationships between various predictive models.
Taylor Arnold is an assistant professor of statistics at the University of Richmond. His work at the intersection of computer vision, natural language processing, and digital humanities has been supported by multiple grants from the National Endowment for the Humanities (NEH) and the American Council of Learned Societies (ACLS). His first book, Humanities Data in R , was published in 2015.
Michael Kane is an assistant professor of biostatistics at Yale University. He is the recipient of grants from the National Institutes of Health (NIH), DARPA, and the Bill and Melinda Gates Foundation. His R package bigmemory won the Chamber's prize for statistical software in 2010.
Bryan Lewis is an applied mathematician and author of many popular R packages, including irlba , doRedis , and threejs .
Author Notes
Taylor Arnold is an assistant professor of statistics at the University of Richmond. His work at the intersection of computer vision, natural language processing, and digital humanities has been supported by multiple grants from the National Endowment for the Humanities and the American Council of Learned Societies. His first book, Humanities Data in R, was published in 2015.
Michael Kane is an assistant professor of biostatistics at Yale University. He is the recipient of grants from the National Institutes of Health, DARPA, and the Bill and Melinda Gates Foundation. His R package bigmemory won the Chambers' prize for statistical software in 2010.
Bryan W. Lewis is an applied mathematician and author of many popular R packages, including irlba, doRedis, and threejs.
Table of Contents
Preface | p. xi |
1 Introduction | p. 1 |
1.1 Computational approach | p. 1 |
1.2 Statistical learning | p. 2 |
1.3 Example | p. 3 |
1.4 Prerequisites | p. 5 |
1.5 How to read this book | p. 6 |
1.6 Supplementary materials | p. 7 |
1.7 Formalisms and terminology | p. 7 |
1.8 Exercises | p. 9 |
2 Linear Models | p. 11 |
2.1 Introduction | p. 11 |
2.2 Ordinary least squares | p. 13 |
2.3 The normal equations | p. 15 |
2.4 Solving least squares with the singular value decomposition | p. 17 |
2.5 Directly solving the linear system | p. 19 |
2.6 (*) Solving linear models using the QR decomposition | p. 22 |
2.7 (*) Sensitivity analysis | p. 24 |
2.8 (*) Relationship between numerical and statistical error | p. 28 |
2.9 Implementation and notes | p. 31 |
2.10 Application: Cancer incidence rates | p. 32 |
2.11 Exercises | p. 40 |
3 Ridge Regression and Principal Component Analysis | p. 43 |
3.1 Variance in OLS | p. 43 |
3.2 Ridge regression | p. 46 |
3.3 (*) A Bayesian perspective | p. 53 |
3.4 Principal component analysis | p. 56 |
3.5 Implementation and notes | p. 63 |
3.6 Application: NYC taxicab data | p. 65 |
3.7 Exercises | p. 72 |
4 Linear Smoothers | p. 75 |
4.1 Non-Linearity | p. 75 |
4.2 Basis expansion | p. 76 |
4.3 Kernel regression | p. 81 |
4.4 Local regression | p. 85 |
4.5 Regression splines | p. 89 |
4.6 (*) Smoothing splines | p. 95 |
4.7 (*) B-splines | p. 100 |
4.8 Implementation and notes | p. 104 |
4.9 Application: U.S. census tract data | p. 105 |
4.10 Exercises | p. 120 |
5 Generalized Linear Models | p. 123 |
5.1 Classification with, linear models | p. 123 |
5.2 Exponential families | p. 128 |
5.3 Iteratively reweighted GLMs | p. 131 |
5.4 (*) Numerical issues | p. 135 |
5.5 (*) Multi-Class regression | p. 138 |
5.6 Implementation and notes | p. 139 |
5.7 Application: Chicago crime prediction | p. 140 |
5.8 Exercises | p. 148 |
6 Additive Models | p. 151 |
6.1 Multivariate linear smoothers | p. 151 |
6.2 Curse of dimensionality | p. 155 |
6.3 Additive models | p. 158 |
6.4 (*) Additive models as linear models | p. 163 |
6.5 (*) Standard errors in additive models | p. 166 |
6.6 Implementation and notes | p. 170 |
6.7 Application: NYC flights data | p. 172 |
6.8 Exercises | p. 178 |
7 Penalized Regression Models | p. 179 |
7.1 Variable selection | p. 179 |
7.2 Penalized regression with the l 0 - and l 1 -norms | p. 180 |
7.3 Orthogonal data matrix | p. 182 |
7.4 Convex optimization and the elastic net | p. 186 |
7.5 Coordinate descent | p. 188 |
7.6 (*) Active set screening using the KKT conditions | p. 193 |
7.7 (*) The generalized elastic net model | p. 198 |
7.8 Implementation and notes | p. 200 |
7.9 Application: Amazon product reviews | p. 201 |
7.10 Exercises | p. 206 |
8 Neural Networks | p. 207 |
8.1 Dense neural network architecture | p. 207 |
8.2 Stochastic gradient descent | p. 211 |
8.3 Backward propagation of errors | p. 213 |
8.4 Implementing backpropagation | p. 216 |
8.5 Recognizing handwritten digits | p. 224 |
8.6 (*) Improving SGD and regularization | p. 226 |
8.7 (*) Glassification with neural networks | p. 232 |
8.8 (*) Convolutional neural networks | p. 239 |
8.9 Implementation and notes | p. 249 |
8.10 Application: Image classification with EMNIST | p. 249 |
8.11 Exercises | p. 259 |
9 Dimensionality Reduction | p. 261 |
9.1 Unsupervised learning | p. 261 |
9.2 Kernel functions | p. 262 |
9.3 Kernel principal component analysis | p. 266 |
9.4 Spectral clustering | p. 272 |
9.5 t-Distributed stochastic neighbor embedding (t-SNE) | p. 277 |
9.6 Autoencoders | p. 282 |
9.7 Implementation and notes | p. 283 |
9.8 Application: Classifying and visualizing fashion MNIST | p. 284 |
9.9 Exercises | p. 295 |
10 Computation in Practice | p. 297 |
10.1 Reference implementations | p. 297 |
10.2 Sparse matrices | p. 298 |
10.3 Sparse generalized linear models | p. 304 |
10.4 Computation on row chunks | p. 307 |
10.5 Feature hashing | p. 311 |
10.6 Data quality issues | p. 318 |
10.7 Implementation and notes | p. 320 |
10.8 Application | p. 321 |
10.9 Exercises | p. 329 |
A Linear algebra and matrices | p. 331 |
A.l Vector spaces | p. 331 |
A.2 Matrices | p. 333 |
B Floating Point Arithmetic and Numerical Computation | p. 337 |
B.1 Floating point arithmetic | p. 337 |
B.2 Computational effort | p. 340 |
Bibliography | p. 343 |
Index | p. 359 |