Foundations of predictive analytics

Drawing on the authors' two decades of experience in applied modeling and data mining, Foundations of Predictive Analytics presents the fundamental background required for analyzing data and building models for many practical applications, such as consumer behavior modeling, risk and marketing analytics, and other areas. It also discusses a variety of practical topics that are frequently missing from similar texts.

The book begins with the statistical and linear algebra/matrix foundation of modeling methods, from distributions to cumulant and copula functions to Cornish-Fisher expansion and other useful but hard-to-find statistical techniques. It then describes common and unusual linear methods as well as popular nonlinear modeling approaches, including additive models, trees, support vector machine, fuzzy systems, clustering, naïve Bayes, and neural nets. The authors go on to cover methodologies used in time series and forecasting, such as ARIMA, GARCH, and survival analysis. They also present a range of optimization techniques and explore several special topics, such as Dempster-Shafer theory.

An in-depth collection of the most important fundamental material on predictive analytics, this self-contained book provides the necessary information for understanding various techniques for exploratory data analysis and modeling. It explains the algorithmic details behind each technique (including underlying assumptions and mathematical formulations) and shows how to prepare and encode data, select variables, use model goodness measures, normalize odds, and perform reject inference.

Web Resource
The book's website at www.DataMinerXL.com offers the DataMinerXL software for building predictive models. The site also includes more examples and information on modeling.

Author Notes

James Wu is a Fixed Income Quant with extensive expertise in a wide variety of applied analytical solutions in consumer behavior modeling and financial engineering. He previously worked at ID Analytics, Morgan Stanley, JPMorgan Chase, Los Alamos Computational Group, and CASA. He earned a PhD from the University of Idaho.

Stephen Coggeshall is the Chief Technology Officer of ID Analytics. He previously worked at Los Alamos Computational Group, Morgan Stanley, HNC Software, CASA, and Los Alamos National Laboratory. During his over 20 year career, Dr. Coggeshall has helped teams of scientists develop practical solutions to difficult business problems using advanced analytics. He earned a PhD from the University of Illinois and was named 2008 Technology Executive of the Year by the San Diego Business Journal.

List of Figures	p. xv
List of Tables	p. xvii
Preface	p. xix
1 Introduction	p. 1
1.1 What Is a Model?	p. 1
1.2 What Is a Statistical Model?	p. 2
1.3 The Modeling Process	p. 3
1.4 Modeling Pitfalls	p. 4
1.5 Characteristics of Good Modelers	p. 5
1.6 The Future of Predictive Analytics	p. 7
2 Properties of Statistical Distributions	p. 9
2.1 Fundamental Distributions	p. 9
2.1.1 Uniform Distribution	p. 9
2.1.2 Details of the Normal (Gaussian) Distribution	p. 10
2.1.3 Lognormal Distribution	p. 19
2.1.4 ¿ Distribution	p. 20
2.1.5 Chi-Squared Distribution	p. 22
2.1.6 Non-Central Chi-Squared Distribution	p. 25
2.1.7 Student's t-Distribution	p. 28
2.1.8 Multivariate t-Distribution	p. 29
2.1.9 F-Distribution	p. 31
2.1.10 Binomial Distribution	p. 31
2.1.11 Poisson Distribution	p. 32
2.1.12 Exponential Distribution	p. 32
2.1.13 Geometric Distribution	p. 33
2.1.14 Hypergeometric Distribution	p. 33
2.1.15 Negative Binomial Distribution	p. 34
2.1.16 Inverse Gaussian (IG) Distribution	p. 35
2.1.17 Normal Inverse Gaussian (NIG) Distribution	p. 36
2.2 Central Limit Theorem	p. 38
2.3 Estimate of Mean, Variance, Skewness, and Kurtosis from Sample Data	p. 40
2.4 Estimate of the Standard Deviation of the Sample Mean	p. 40
2.5 (Pseudo) Random Number Generators	p. 41
2.5.1 Mersenne Twister Pseudorandom Number Generator	p. 42
2.5.2 Box-Muller Transform for Generating a Normal Distribution	p. 42
2.6 Transformation of a Distribution Function	p. 43
2.7 Distribution of a Function of Random Variables	p. 43
2.7.1 Z = X + Y	p. 44
2.7.2 Z = XY	p. 44
2.7.3 (Z 1 ,Z 2 ,...,Z n ) = (X 1 ,X 2 ,...,X n ) Y	p. 44
2.7.4 Z = X/Y	p. 45
2.7.5 Z = max(X,Y)	p. 45
2.7.6 Z = min(X,Y)	p. 45
2.8 Moment Generating Function	p. 46
2.8.1 Moment Generating Function of Binomial Distribution	p. 46
2.8.2 Moment Generating Function of Normal Distribution	p. 47
2.8.3 Moment Generating Function of the ¿ Distribution	p. 47
2.8.4 Moment Generating Function of Chi-Square Distribution	p. 47
2.8.5 Moment Generating Function of the Poisson Distribution	p. 48
2.9 Cumulant Generating Function	p. 48
2.10 Characteristic Function	p. 50
2.10.1 Relationship between Cumulative Function and Characteristic Function	p. 51
2.10.2 Characteristic Function of Normal Distribution	p. 52
2.10.3 Characteristic Function of ¿ Distribution	p. 52
2.11 Chebyshev's Inequality	p. 53
2.12 Markov's Inequality	p. 54
2.13 Gram-Charlier Series	p. 54
2.14 Edgeworth Expansion	p. 55
2.15 Cornish-Fisher Expansion	p. 56
2.15.1 Lagrange Inversion Theorem	p. 56
2.15.2 Cornish-Fisher Expansion	p. 57
2.16 Copula Functions	p. 58
2.16.1 Gaussian Copula	p. 60
2.16.2 t-Copula	p. 61
2.16.3 Archimedean Copula	p. 62
3 Important Matrix Relationships	p. 63
3.1 Pseudo-Inverse of a Matrix	p. 63
3.2 A Lemma of Matrix Inversion	p. 64
3.3 Identity for a Matrix Determinant	p. 66
3.4 Inversion of Partitioned Matrix	p. 66
3.5 Determinant of Partitioned Matrix	p. 67
3.6 Matrix Sweep and Partial Correlation	p. 67
3.7 Singular Value Decomposition (SVD)	p. 69
3.8 Diagonalization of a Matrix	p. 71
3.9 Spectral Decomposition of a Positive Semi-Definite Matrix	p. 75
3.10 Normalization in Vector Space	p. 76
3.11 Conjugate Decomposition of a Symmetric Definite Matrix	p. 77
3.12 Cholesky Decomposition	p. 77
3.13 Cauchy-Schwartz Inequality .	p. 80
3.14 Relationship of Correlation among Three Variables	p. 81
4 Linear Modeling and Regression	p. 83
4.1 Properties of Maximum Likelihood Estimators	p. 84
4.1.1 Likelihood Ratio Test	p. 87
4.1.2 Wald Test	p. 87
4.1.3 Lagrange Multiplier Statistic	p. 88
4.2 Linear Regression	p. 88
4.2.1 Ordinary Least Squares (OLS) Regression	p. 89
4.2.2 Interpretation of the Coefficients of Linear Regression	p. 95
4.2.3 Regression on Weighted Data	p. 97
4.2.4 Incrementally Updating a Regression Model with Additional Data	p. 100
4.2.5 Partitioned Regression	p. 101
4.2.6 How Does the Regression Change When Adding One More Variable?	p. 101
4.2.7 Linearly Restricted Least Squares Regression	p. 103
4.2.8 Significance of the Correlation Coefficient	p. 105
4.2.9 Partial Correlation	p. 105
4.2.10 Ridge Regression	p. 105
4.3 Fisher's Linear Discriminant Analysis	p. 106
4.4 Principal Component Regression (PCR)	p. 109
4.5 Factor Analysis	p. 110
4.6 Partial Least Squares Regression (PLSR)	p. 111
4.7 Generalized Linear Model (GLM)	p. 113
4.8 Logistic Regression: Binary	p. 116
4.9 Logistic Regression: Multiple Nominal	p. 119
4.10 Logistic Regression: Proportional Multiple Ordinal	p. 121
4.11 Fisher Scoring Method for Logistic Regression . .	p. 123
4.12 Tobit Model: A Censored Regression Model	p. 125
4.12.1 Some Properties of the Normal Distribution	p. 125
4.12.2 Formulation of the Tobit Model	p. 126
5 Nonlinear Modeling	p. 129
5.1 Naive Bayesian Classifier	p. 129
5.2 Neural Network	p. 131
5.2.1 Back Propagation Neural Network	p. 131
5.3 Segmentation and Tree Models	p. 137
5.3.1 Segmentation	p. 137
5.3.2 Tree Models	p. 138
5.3.3 Sweeping to Find the Best Cutpoint	p. 140
5.3.4 Impurity Measure of a Population: Entropy and Gini Index	p. 143
5.3.5 Chi-Square Splitting Rule	p. 147
5.3.6 Implementation of Decision Trees	p. 148
5.4 Additive Models	p. 151
5.4.1 Boosted Tree	p. 153
5.4.2 Least Squares Regression Boosting Tree	p. 154
5.4.3 Binary Logistic Regression Boosting Tree	p. 155
5.5 Support Vector Machine (SVM)	p. 158
5.5.1 Wolfe Dual	p. 158
5.5.2 Linearly Separable Problem	p. 159
5.5.3 Linearly Inseparable Problem	p. 161
5.5.4 Constructing Higher-Dimensional Space and Kernel	p. 162
5.5.5 Model Output	p. 163
5.5.6 C-Support Vector Classification (C-SVC) for Classification	p. 164
5.5.7 ¿-Support Vector Regression (¿-SVR) for Regression	p. 164
5.5.8 The Probability Estimate	p. 167
5.6 Fuzzy Logic System	p. 168
5.6.1 A Simple Fuzzy Logic System	p. 168
5.7 Clustering	p. 169
5.7.1 K Means, Fuzzy C Means	p. 170
5.7.2 Nearest Neighbor, K Nearest Neighbor (KNN	p. 171
5.7.3 Comments on Clustering Methods	p. 171
6 Time Series Analysis	p. 173
6.1 Fundamentals of Forecasting	p. 173
6.1.1 Box-Cox Transformation	p. 174
6.1.2 Smoothing Algorithms	p. 175
6.1.3 Convolution of Linear Filters	p. 176
6.1.4 Linear Difference Equation	p. 177
6.1.5 The Autocovariance Function and Autocorrelation Function	p. 178
6.1.6 The Partial Autocorrelation Function	p. 179
6.2 ARIMA Models	p. 181
6.2.1 MA(q) Process	p. 182
6.2.2 AR(p) Process	p. 184
6.2.3 ARMA(p, q) Process	p. 186
6.3 Survival Data Analysis	p. 187
6.3.1 Sampling Method	p. 190
6.4 Exponentially Weighted Moving Average (EWMA) and GARCH(1, 1)	p. 191
6.4.1 Exponentially Weighted Moving Average (EWMA)	p. 191
6.4.2 ARCH and GARCH Models	p. 192
7 Data Preparation and Variable Selection	p. 195
7.1 Data Quality and Exploration	p. 196
7.2 Variable Scaling and Transformation	p. 197
7.3 How to Bin Variables .	p. 197
7.3.1 Equal Interval	p. 198
7.3.2 Equal Population	p. 198
7.3.3 Tree Algorithms	p. 199
7.4 Interpolation in One and Two Dimensions	p. 199
7.5 Weight of Evidence (WOE) Transformation	p. 200
7.6 Variable Selection Overview	p. 204
7.7 Missing Data Imputation	p. 206
7.8 Stepwise Selection Methods	p. 207
7.8.1 Forward Selection in Linear Regression	p. 208
7.8.2 Forward Selection in Logistic Regression	p. 208
7.9 Mutual Information, KL Distance	p. 209
7.10 Detection of Multicollinearity	p. 210
8 Model Goodness Measures	p. 213
8.1 Training, Testing, Validation	p. 213
8.2 Continuous Dependent Variable	p. 215
8.2.1 Example: Linear Regression	p. 217
8.3 Binary Dependent Variable (Two-Group Classification)	p. 218
8.3.1 Kolmogorov-Smirnov (KS) Statistic	p. 218
8.3.2 Confusion Matrix	p. 220
8.3.3 Concordant and Discordant	p. 221
8.3.4 R 2 for Logistic Regression	p. 223
8.3.5 AIC and SBC	p. 224
8.3.6 Hosmer-Lemeshow Goodness-of-Fit Test	p. 224
8.3.7 Example: Logistic Regression	p. 225
8.4 Population Stability Index Using Relative Entropy	p. 227
9 Optimization Methods	p. 231
9.1 Lagrange Multiplier	p. 232
9.2 Gradient Descent Method	p. 234
9.3 Newton-Raphson Method	p. 236
9.4 Conjugate Gradient Method	p. 238
9.5 Quasi-Newton Method	p. 240
9.6 Genetic Algorithms (GA)	p. 242
9.7 Simulated Annealing	p. 242
9.8 Linear Programming	p. 243
9.9 Nonlinear Programming (NLP)	p. 247
9.9.1 General Nonlinear Programming (GNLP)	p. 248
9.9.2 Lagrange Dual Problem	p. 249
9.9.3 Quadratic Programming (QP)	p. 250
9.9.4 Linear Complementarity Programming (LCP	p. 254
9.9.5 Sequential Quadratic Programming (SQP)	p. 256
9.10 Nonlinear Equations	p. 263
9.11 Expectation-Maximization (EM) Algorithm	p. 264
9.12 Optimal Design of Experiment	p. 268
10 Miscellaneous Topics	p. 271
10.1 Multidimensional Scaling	p. 271
10.2 Simulation	p. 274
10.3 Odds Normalization and Score Transformation	p. 278
10.4 Reject Inference	p. 280
10.5 Dempster-Shafer Theory of Evidence	p. 281
10.5.1 Some Properties in Set Theory	p. 281
10.5.2 Basic Probability Assignment, Belief Function, and Plausibility Function	p. 282
10.5.3 Dempster-Shafer's Rule of Combination	p. 285
10.5.4 Applications of Dempster-Shafer Theory of Evidence: Multiple Classifier Function	p. 287
Appendix A Useful Mathematical Relations	p. 291
A.1 Information Inequality	p. 291
A.2 Relative Entropy	p. 291
A.3 Saddle-Point Method	p. 292
A.4 Stirling's Formula	p. 293
A.5 Convex Function and Jensen's Inequality	p. 294
Appendix B DataMinerXL - Microsoft Excel Add-In for Building Predictive Models	p. 299
B.1 Overview	p. 299
B.2 Utility Functions	p. 299
B.3 Data Manipulation Functions	p. 300
B.4 Basic Statistical Functions	p. 300
B.5 Modeling Functions for All Models	p. 301
B.6 Weight of Evidence Transformation Functions	p. 301
B.7 Linear Regression Functions	p. 302
B.8 Partial Least Squares Regression Functions	p. 302
B.9 Logistic Regression Functions	p. 303
B.10 Time Series Analysis Functions	p. 303
B.11 Naive Bayes Classifier Functions	p. 303
B.12 Tree-Based Model Functions	p. 304
B.13 Clustering and Segmentation Functions	p. 304
B.14 Neural Network Functions	p. 304
B.15 Support Vector Machine Functions	p. 304
B.16 Optimization Functions	p. 305
B.17 Matrix Operation Functions	p. 305
B.18 Numerical Integration Functions	p. 306
B.19 Excel Built-in Statistical Distribution Functions	p. 306
Bibliography	p. 309
Index	p. 313

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents