Cover image for Foundations of predictive analytics
Title:
Foundations of predictive analytics
Personal Author:
Series:
Chapman & Hall/CRC data mining and knowledge discovery series
Publication Information:
Boca Raton, FL : CRC Press, c2012
Physical Description:
xix, 317 p. : ill. ; 24 cm.
ISBN:
9781439869468
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010329023 QA76.9.D343 W85 2012 Open Access Book Book
Searching...

On Order

Summary

Summary

Drawing on the authors' two decades of experience in applied modeling and data mining, Foundations of Predictive Analytics presents the fundamental background required for analyzing data and building models for many practical applications, such as consumer behavior modeling, risk and marketing analytics, and other areas. It also discusses a variety of practical topics that are frequently missing from similar texts.

The book begins with the statistical and linear algebra/matrix foundation of modeling methods, from distributions to cumulant and copula functions to Cornish-Fisher expansion and other useful but hard-to-find statistical techniques. It then describes common and unusual linear methods as well as popular nonlinear modeling approaches, including additive models, trees, support vector machine, fuzzy systems, clustering, naïve Bayes, and neural nets. The authors go on to cover methodologies used in time series and forecasting, such as ARIMA, GARCH, and survival analysis. They also present a range of optimization techniques and explore several special topics, such as Dempster-Shafer theory.

An in-depth collection of the most important fundamental material on predictive analytics, this self-contained book provides the necessary information for understanding various techniques for exploratory data analysis and modeling. It explains the algorithmic details behind each technique (including underlying assumptions and mathematical formulations) and shows how to prepare and encode data, select variables, use model goodness measures, normalize odds, and perform reject inference.

Web Resource
The book's website at www.DataMinerXL.com offers the DataMinerXL software for building predictive models. The site also includes more examples and information on modeling.


Author Notes

James Wu is a Fixed Income Quant with extensive expertise in a wide variety of applied analytical solutions in consumer behavior modeling and financial engineering. He previously worked at ID Analytics, Morgan Stanley, JPMorgan Chase, Los Alamos Computational Group, and CASA. He earned a PhD from the University of Idaho.

Stephen Coggeshall is the Chief Technology Officer of ID Analytics. He previously worked at Los Alamos Computational Group, Morgan Stanley, HNC Software, CASA, and Los Alamos National Laboratory. During his over 20 year career, Dr. Coggeshall has helped teams of scientists develop practical solutions to difficult business problems using advanced analytics. He earned a PhD from the University of Illinois and was named 2008 Technology Executive of the Year by the San Diego Business Journal.


Table of Contents

List of Figuresp. xv
List of Tablesp. xvii
Prefacep. xix
1 Introductionp. 1
1.1 What Is a Model?p. 1
1.2 What Is a Statistical Model?p. 2
1.3 The Modeling Processp. 3
1.4 Modeling Pitfallsp. 4
1.5 Characteristics of Good Modelersp. 5
1.6 The Future of Predictive Analyticsp. 7
2 Properties of Statistical Distributionsp. 9
2.1 Fundamental Distributionsp. 9
2.1.1 Uniform Distributionp. 9
2.1.2 Details of the Normal (Gaussian) Distributionp. 10
2.1.3 Lognormal Distributionp. 19
2.1.4 ¿ Distributionp. 20
2.1.5 Chi-Squared Distributionp. 22
2.1.6 Non-Central Chi-Squared Distributionp. 25
2.1.7 Student's t-Distributionp. 28
2.1.8 Multivariate t-Distributionp. 29
2.1.9 F-Distributionp. 31
2.1.10 Binomial Distributionp. 31
2.1.11 Poisson Distributionp. 32
2.1.12 Exponential Distributionp. 32
2.1.13 Geometric Distributionp. 33
2.1.14 Hypergeometric Distributionp. 33
2.1.15 Negative Binomial Distributionp. 34
2.1.16 Inverse Gaussian (IG) Distributionp. 35
2.1.17 Normal Inverse Gaussian (NIG) Distributionp. 36
2.2 Central Limit Theoremp. 38
2.3 Estimate of Mean, Variance, Skewness, and Kurtosis from Sample Datap. 40
2.4 Estimate of the Standard Deviation of the Sample Meanp. 40
2.5 (Pseudo) Random Number Generatorsp. 41
2.5.1 Mersenne Twister Pseudorandom Number Generatorp. 42
2.5.2 Box-Muller Transform for Generating a Normal Distributionp. 42
2.6 Transformation of a Distribution Functionp. 43
2.7 Distribution of a Function of Random Variablesp. 43
2.7.1 Z = X + Yp. 44
2.7.2 Z = XYp. 44
2.7.3 (Z 1 ,Z 2 ,...,Z n ) = (X 1 ,X 2 ,...,X n ) Yp. 44
2.7.4 Z = X/Yp. 45
2.7.5 Z = max(X,Y)p. 45
2.7.6 Z = min(X,Y)p. 45
2.8 Moment Generating Functionp. 46
2.8.1 Moment Generating Function of Binomial Distributionp. 46
2.8.2 Moment Generating Function of Normal Distributionp. 47
2.8.3 Moment Generating Function of the ¿ Distributionp. 47
2.8.4 Moment Generating Function of Chi-Square Distributionp. 47
2.8.5 Moment Generating Function of the Poisson Distributionp. 48
2.9 Cumulant Generating Functionp. 48
2.10 Characteristic Functionp. 50
2.10.1 Relationship between Cumulative Function and Characteristic Functionp. 51
2.10.2 Characteristic Function of Normal Distributionp. 52
2.10.3 Characteristic Function of ¿ Distributionp. 52
2.11 Chebyshev's Inequalityp. 53
2.12 Markov's Inequalityp. 54
2.13 Gram-Charlier Seriesp. 54
2.14 Edgeworth Expansionp. 55
2.15 Cornish-Fisher Expansionp. 56
2.15.1 Lagrange Inversion Theoremp. 56
2.15.2 Cornish-Fisher Expansionp. 57
2.16 Copula Functionsp. 58
2.16.1 Gaussian Copulap. 60
2.16.2 t-Copulap. 61
2.16.3 Archimedean Copulap. 62
3 Important Matrix Relationshipsp. 63
3.1 Pseudo-Inverse of a Matrixp. 63
3.2 A Lemma of Matrix Inversionp. 64
3.3 Identity for a Matrix Determinantp. 66
3.4 Inversion of Partitioned Matrixp. 66
3.5 Determinant of Partitioned Matrixp. 67
3.6 Matrix Sweep and Partial Correlationp. 67
3.7 Singular Value Decomposition (SVD)p. 69
3.8 Diagonalization of a Matrixp. 71
3.9 Spectral Decomposition of a Positive Semi-Definite Matrixp. 75
3.10 Normalization in Vector Spacep. 76
3.11 Conjugate Decomposition of a Symmetric Definite Matrixp. 77
3.12 Cholesky Decompositionp. 77
3.13 Cauchy-Schwartz Inequality .p. 80
3.14 Relationship of Correlation among Three Variablesp. 81
4 Linear Modeling and Regressionp. 83
4.1 Properties of Maximum Likelihood Estimatorsp. 84
4.1.1 Likelihood Ratio Testp. 87
4.1.2 Wald Testp. 87
4.1.3 Lagrange Multiplier Statisticp. 88
4.2 Linear Regressionp. 88
4.2.1 Ordinary Least Squares (OLS) Regressionp. 89
4.2.2 Interpretation of the Coefficients of Linear Regressionp. 95
4.2.3 Regression on Weighted Datap. 97
4.2.4 Incrementally Updating a Regression Model with Additional Datap. 100
4.2.5 Partitioned Regressionp. 101
4.2.6 How Does the Regression Change When Adding One More Variable?p. 101
4.2.7 Linearly Restricted Least Squares Regressionp. 103
4.2.8 Significance of the Correlation Coefficientp. 105
4.2.9 Partial Correlationp. 105
4.2.10 Ridge Regressionp. 105
4.3 Fisher's Linear Discriminant Analysisp. 106
4.4 Principal Component Regression (PCR)p. 109
4.5 Factor Analysisp. 110
4.6 Partial Least Squares Regression (PLSR)p. 111
4.7 Generalized Linear Model (GLM)p. 113
4.8 Logistic Regression: Binaryp. 116
4.9 Logistic Regression: Multiple Nominalp. 119
4.10 Logistic Regression: Proportional Multiple Ordinalp. 121
4.11 Fisher Scoring Method for Logistic Regression . .p. 123
4.12 Tobit Model: A Censored Regression Modelp. 125
4.12.1 Some Properties of the Normal Distributionp. 125
4.12.2 Formulation of the Tobit Modelp. 126
5 Nonlinear Modelingp. 129
5.1 Naive Bayesian Classifierp. 129
5.2 Neural Networkp. 131
5.2.1 Back Propagation Neural Networkp. 131
5.3 Segmentation and Tree Modelsp. 137
5.3.1 Segmentationp. 137
5.3.2 Tree Modelsp. 138
5.3.3 Sweeping to Find the Best Cutpointp. 140
5.3.4 Impurity Measure of a Population: Entropy and Gini Indexp. 143
5.3.5 Chi-Square Splitting Rulep. 147
5.3.6 Implementation of Decision Treesp. 148
5.4 Additive Modelsp. 151
5.4.1 Boosted Treep. 153
5.4.2 Least Squares Regression Boosting Treep. 154
5.4.3 Binary Logistic Regression Boosting Treep. 155
5.5 Support Vector Machine (SVM)p. 158
5.5.1 Wolfe Dualp. 158
5.5.2 Linearly Separable Problemp. 159
5.5.3 Linearly Inseparable Problemp. 161
5.5.4 Constructing Higher-Dimensional Space and Kernelp. 162
5.5.5 Model Outputp. 163
5.5.6 C-Support Vector Classification (C-SVC) for Classificationp. 164
5.5.7 ¿-Support Vector Regression (¿-SVR) for Regressionp. 164
5.5.8 The Probability Estimatep. 167
5.6 Fuzzy Logic Systemp. 168
5.6.1 A Simple Fuzzy Logic Systemp. 168
5.7 Clusteringp. 169
5.7.1 K Means, Fuzzy C Meansp. 170
5.7.2 Nearest Neighbor, K Nearest Neighbor (KNNp. 171
5.7.3 Comments on Clustering Methodsp. 171
6 Time Series Analysisp. 173
6.1 Fundamentals of Forecastingp. 173
6.1.1 Box-Cox Transformationp. 174
6.1.2 Smoothing Algorithmsp. 175
6.1.3 Convolution of Linear Filtersp. 176
6.1.4 Linear Difference Equationp. 177
6.1.5 The Autocovariance Function and Autocorrelation Functionp. 178
6.1.6 The Partial Autocorrelation Functionp. 179
6.2 ARIMA Modelsp. 181
6.2.1 MA(q) Processp. 182
6.2.2 AR(p) Processp. 184
6.2.3 ARMA(p, q) Processp. 186
6.3 Survival Data Analysisp. 187
6.3.1 Sampling Methodp. 190
6.4 Exponentially Weighted Moving Average (EWMA) and GARCH(1, 1)p. 191
6.4.1 Exponentially Weighted Moving Average (EWMA)p. 191
6.4.2 ARCH and GARCH Modelsp. 192
7 Data Preparation and Variable Selectionp. 195
7.1 Data Quality and Explorationp. 196
7.2 Variable Scaling and Transformationp. 197
7.3 How to Bin Variables .p. 197
7.3.1 Equal Intervalp. 198
7.3.2 Equal Populationp. 198
7.3.3 Tree Algorithmsp. 199
7.4 Interpolation in One and Two Dimensionsp. 199
7.5 Weight of Evidence (WOE) Transformationp. 200
7.6 Variable Selection Overviewp. 204
7.7 Missing Data Imputationp. 206
7.8 Stepwise Selection Methodsp. 207
7.8.1 Forward Selection in Linear Regressionp. 208
7.8.2 Forward Selection in Logistic Regressionp. 208
7.9 Mutual Information, KL Distancep. 209
7.10 Detection of Multicollinearityp. 210
8 Model Goodness Measuresp. 213
8.1 Training, Testing, Validationp. 213
8.2 Continuous Dependent Variablep. 215
8.2.1 Example: Linear Regressionp. 217
8.3 Binary Dependent Variable (Two-Group Classification)p. 218
8.3.1 Kolmogorov-Smirnov (KS) Statisticp. 218
8.3.2 Confusion Matrixp. 220
8.3.3 Concordant and Discordantp. 221
8.3.4 R 2 for Logistic Regressionp. 223
8.3.5 AIC and SBCp. 224
8.3.6 Hosmer-Lemeshow Goodness-of-Fit Testp. 224
8.3.7 Example: Logistic Regressionp. 225
8.4 Population Stability Index Using Relative Entropyp. 227
9 Optimization Methodsp. 231
9.1 Lagrange Multiplierp. 232
9.2 Gradient Descent Methodp. 234
9.3 Newton-Raphson Methodp. 236
9.4 Conjugate Gradient Methodp. 238
9.5 Quasi-Newton Methodp. 240
9.6 Genetic Algorithms (GA)p. 242
9.7 Simulated Annealingp. 242
9.8 Linear Programmingp. 243
9.9 Nonlinear Programming (NLP)p. 247
9.9.1 General Nonlinear Programming (GNLP)p. 248
9.9.2 Lagrange Dual Problemp. 249
9.9.3 Quadratic Programming (QP)p. 250
9.9.4 Linear Complementarity Programming (LCPp. 254
9.9.5 Sequential Quadratic Programming (SQP)p. 256
9.10 Nonlinear Equationsp. 263
9.11 Expectation-Maximization (EM) Algorithmp. 264
9.12 Optimal Design of Experimentp. 268
10 Miscellaneous Topicsp. 271
10.1 Multidimensional Scalingp. 271
10.2 Simulationp. 274
10.3 Odds Normalization and Score Transformationp. 278
10.4 Reject Inferencep. 280
10.5 Dempster-Shafer Theory of Evidencep. 281
10.5.1 Some Properties in Set Theoryp. 281
10.5.2 Basic Probability Assignment, Belief Function, and Plausibility Functionp. 282
10.5.3 Dempster-Shafer's Rule of Combinationp. 285
10.5.4 Applications of Dempster-Shafer Theory of Evidence: Multiple Classifier Functionp. 287
Appendix A Useful Mathematical Relationsp. 291
A.1 Information Inequalityp. 291
A.2 Relative Entropyp. 291
A.3 Saddle-Point Methodp. 292
A.4 Stirling's Formulap. 293
A.5 Convex Function and Jensen's Inequalityp. 294
Appendix B DataMinerXL - Microsoft Excel Add-In for Building Predictive Modelsp. 299
B.1 Overviewp. 299
B.2 Utility Functionsp. 299
B.3 Data Manipulation Functionsp. 300
B.4 Basic Statistical Functionsp. 300
B.5 Modeling Functions for All Modelsp. 301
B.6 Weight of Evidence Transformation Functionsp. 301
B.7 Linear Regression Functionsp. 302
B.8 Partial Least Squares Regression Functionsp. 302
B.9 Logistic Regression Functionsp. 303
B.10 Time Series Analysis Functionsp. 303
B.11 Naive Bayes Classifier Functionsp. 303
B.12 Tree-Based Model Functionsp. 304
B.13 Clustering and Segmentation Functionsp. 304
B.14 Neural Network Functionsp. 304
B.15 Support Vector Machine Functionsp. 304
B.16 Optimization Functionsp. 305
B.17 Matrix Operation Functionsp. 305
B.18 Numerical Integration Functionsp. 306
B.19 Excel Built-in Statistical Distribution Functionsp. 306
Bibliographyp. 309
Indexp. 313