Cover image for A Computational Approach to Statistical Learning
Title:
A Computational Approach to Statistical Learning
Personal Author:
Series:
Texts in Statistical Science
Physical Description:
xiii, 361 pages : illustrations ; 25 cm.
ISBN:
9781138046375
Abstract:
A Computational Approach to Statistical Learning gives a novel introduction to predictive modeling by focusing on the algorithmic and numeric motivations behind popular statistical methods. The text contains annotated code to over 80 original reference functions. These functions provide minimal working implementations of common statistical learning algorithms. Every chapter concludes with a fully worked out application that illustrates predictive modeling tasks using a real-world dataset. The text begins with a detailed analysis of linear models and ordinary least squares

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010369665 Q325.5 A76 2019 Open Access Book Book
Searching...
Searching...
33000000003190 Q325.5 A76 2019 Open Access Book Book
Searching...

On Order

Summary

Summary

A Computational Approach to Statistical Learning gives a novel introduction to predictive modeling by focusing on the algorithmic and numeric motivations behind popular statistical methods. The text contains annotated code to over 80 original reference functions. These functions provide minimal working implementations of common statistical learning algorithms. Every chapter concludes with a fully worked out application that illustrates predictive modeling tasks using a real-world dataset.

The text begins with a detailed analysis of linear models and ordinary least squares. Subsequent chapters explore extensions such as ridge regression, generalized linear models, and additive models. The second half focuses on the use of general-purpose algorithms for convex optimization and their application to tasks in statistical learning. Models covered include the elastic net, dense neural networks, convolutional neural networks (CNNs), and spectral clustering. A unifying theme throughout the text is the use of optimization theory in the description of predictive models, with a particular focus on the singular value decomposition (SVD). Through this theme, the computational approach motivates and clarifies the relationships between various predictive models.

Taylor Arnold is an assistant professor of statistics at the University of Richmond. His work at the intersection of computer vision, natural language processing, and digital humanities has been supported by multiple grants from the National Endowment for the Humanities (NEH) and the American Council of Learned Societies (ACLS). His first book, Humanities Data in R , was published in 2015.

Michael Kane is an assistant professor of biostatistics at Yale University. He is the recipient of grants from the National Institutes of Health (NIH), DARPA, and the Bill and Melinda Gates Foundation. His R package bigmemory won the Chamber's prize for statistical software in 2010.

Bryan Lewis is an applied mathematician and author of many popular R packages, including irlba , doRedis , and threejs .


Author Notes

Taylor Arnold is an assistant professor of statistics at the University of Richmond. His work at the intersection of computer vision, natural language processing, and digital humanities has been supported by multiple grants from the National Endowment for the Humanities and the American Council of Learned Societies. His first book, Humanities Data in R, was published in 2015.
Michael Kane is an assistant professor of biostatistics at Yale University. He is the recipient of grants from the National Institutes of Health, DARPA, and the Bill and Melinda Gates Foundation. His R package bigmemory won the Chambers' prize for statistical software in 2010.
Bryan W. Lewis is an applied mathematician and author of many popular R packages, including irlba, doRedis, and threejs.


Table of Contents

Prefacep. xi
1 Introductionp. 1
1.1 Computational approachp. 1
1.2 Statistical learningp. 2
1.3 Examplep. 3
1.4 Prerequisitesp. 5
1.5 How to read this bookp. 6
1.6 Supplementary materialsp. 7
1.7 Formalisms and terminologyp. 7
1.8 Exercisesp. 9
2 Linear Modelsp. 11
2.1 Introductionp. 11
2.2 Ordinary least squaresp. 13
2.3 The normal equationsp. 15
2.4 Solving least squares with the singular value decompositionp. 17
2.5 Directly solving the linear systemp. 19
2.6 (*) Solving linear models using the QR decompositionp. 22
2.7 (*) Sensitivity analysisp. 24
2.8 (*) Relationship between numerical and statistical errorp. 28
2.9 Implementation and notesp. 31
2.10 Application: Cancer incidence ratesp. 32
2.11 Exercisesp. 40
3 Ridge Regression and Principal Component Analysisp. 43
3.1 Variance in OLSp. 43
3.2 Ridge regressionp. 46
3.3 (*) A Bayesian perspectivep. 53
3.4 Principal component analysisp. 56
3.5 Implementation and notesp. 63
3.6 Application: NYC taxicab datap. 65
3.7 Exercisesp. 72
4 Linear Smoothersp. 75
4.1 Non-Linearityp. 75
4.2 Basis expansionp. 76
4.3 Kernel regressionp. 81
4.4 Local regressionp. 85
4.5 Regression splinesp. 89
4.6 (*) Smoothing splinesp. 95
4.7 (*) B-splinesp. 100
4.8 Implementation and notesp. 104
4.9 Application: U.S. census tract datap. 105
4.10 Exercisesp. 120
5 Generalized Linear Modelsp. 123
5.1 Classification with, linear modelsp. 123
5.2 Exponential familiesp. 128
5.3 Iteratively reweighted GLMsp. 131
5.4 (*) Numerical issuesp. 135
5.5 (*) Multi-Class regressionp. 138
5.6 Implementation and notesp. 139
5.7 Application: Chicago crime predictionp. 140
5.8 Exercisesp. 148
6 Additive Modelsp. 151
6.1 Multivariate linear smoothersp. 151
6.2 Curse of dimensionalityp. 155
6.3 Additive modelsp. 158
6.4 (*) Additive models as linear modelsp. 163
6.5 (*) Standard errors in additive modelsp. 166
6.6 Implementation and notesp. 170
6.7 Application: NYC flights datap. 172
6.8 Exercisesp. 178
7 Penalized Regression Modelsp. 179
7.1 Variable selectionp. 179
7.2 Penalized regression with the l 0 - and l 1 -normsp. 180
7.3 Orthogonal data matrixp. 182
7.4 Convex optimization and the elastic netp. 186
7.5 Coordinate descentp. 188
7.6 (*) Active set screening using the KKT conditionsp. 193
7.7 (*) The generalized elastic net modelp. 198
7.8 Implementation and notesp. 200
7.9 Application: Amazon product reviewsp. 201
7.10 Exercisesp. 206
8 Neural Networksp. 207
8.1 Dense neural network architecturep. 207
8.2 Stochastic gradient descentp. 211
8.3 Backward propagation of errorsp. 213
8.4 Implementing backpropagationp. 216
8.5 Recognizing handwritten digitsp. 224
8.6 (*) Improving SGD and regularizationp. 226
8.7 (*) Glassification with neural networksp. 232
8.8 (*) Convolutional neural networksp. 239
8.9 Implementation and notesp. 249
8.10 Application: Image classification with EMNISTp. 249
8.11 Exercisesp. 259
9 Dimensionality Reductionp. 261
9.1 Unsupervised learningp. 261
9.2 Kernel functionsp. 262
9.3 Kernel principal component analysisp. 266
9.4 Spectral clusteringp. 272
9.5 t-Distributed stochastic neighbor embedding (t-SNE)p. 277
9.6 Autoencodersp. 282
9.7 Implementation and notesp. 283
9.8 Application: Classifying and visualizing fashion MNISTp. 284
9.9 Exercisesp. 295
10 Computation in Practicep. 297
10.1 Reference implementationsp. 297
10.2 Sparse matricesp. 298
10.3 Sparse generalized linear modelsp. 304
10.4 Computation on row chunksp. 307
10.5 Feature hashingp. 311
10.6 Data quality issuesp. 318
10.7 Implementation and notesp. 320
10.8 Applicationp. 321
10.9 Exercisesp. 329
A Linear algebra and matricesp. 331
A.l Vector spacesp. 331
A.2 Matricesp. 333
B Floating Point Arithmetic and Numerical Computationp. 337
B.1 Floating point arithmeticp. 337
B.2 Computational effortp. 340
Bibliographyp. 343
Indexp. 359