Cover image for Data mining for business intelligence : concepts, techniques, and applications in Microsoft Office Excel with XLMiner
Title:
Data mining for business intelligence : concepts, techniques, and applications in Microsoft Office Excel with XLMiner
Personal Author:
Publication Information:
Hoboken, NJ : John Wiley & Sons, 2007
ISBN:
9780470084854

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010141222 HF5548.2 S554 2007 Open Access Book Book
Searching...

On Order

Summary

Summary

Learn how to develop models for classification, prediction, and customer segmentation with the help of Data Mining for Business Intelligence

In today's world, businesses are becoming more capable of accessing their ideal consumers, and an understanding of data mining contributes to this success. Data Mining for Business Intelligence , which was developed from a course taught at the Massachusetts Institute of Technology's Sloan School of Management, and the University of Maryland's Smith School of Business, uses real data and actual cases to illustrate the applicability of data mining intelligence to the development of successful business models.

Featuring XLMiner, the Microsoft Office Excel add-in, this book allows readers to follow along and implement algorithms at their own speed, with a minimal learning curve. In addition, students and practitioners of data mining techniques are presented with hands-on, business-oriented applications. An abundant amount of exercises and examples are provided to motivate learning and understanding.

Data Mining for Business Intelligence:

Provides both a theoretical and practical understanding of the key methods of classification, prediction, reduction, exploration, and affinity analysis Features a business decision-making context for these key methods Illustrates the application and interpretation of these methods using real business cases and data

This book helps readers understand the beneficial relationship that can be established between data mining and smart business practices, and is an excellent learning tool for creating valuable strategies and making wiser business decisions.


Author Notes

Galit Shmueli, PhD, is Assistant Professor of Statistics in the Decision and Information Technologies Department of the Robert H. Smith School of Business at the University of Maryland
Nitin R. Patel, PhD, is Chairman, Founder, and Chief Technology Officer of Cambridge-based Cytel Incorporated and a Visiting Professor in the Engineering Systems Division at the Massachusetts Institute of Technology


Table of Contents

Forewordp. xiii
Prefacep. xv
Acknowledgmentsp. xvii
1 Introductionp. 1
1.1 What Is Data Mining?p. 1
1.2 Where Is Data Mining Used?p. 2
1.3 The Origins of Data Miningp. 2
1.4 The Rapid Growth of Data Miningp. 3
1.5 Why Are There So Many Different Methods?p. 4
1.6 Terminology and Notationp. 4
1.7 Road Maps to This Bookp. 6
2 Overview of the Data Mining Processp. 9
2.1 Introductionp. 9
2.2 Core Ideas in Data Miningp. 9
2.3 Supervised and Unsupervised Learningp. 11
2.4 The Steps in Data Miningp. 11
2.5 Preliminary Stepsp. 13
2.6 Building a Model: Example with Linear Regressionp. 21
2.7 Using Excel for Data Miningp. 27
Problemsp. 31
3 Data Exploration and Dimension Reductionp. 35
3.1 Introductionp. 35
3.2 Practical Considerationsp. 35
Example 1 House Prices in Bostonp. 36
3.3 Data Summariesp. 37
3.4 Data Visualizationp. 38
3.5 Correlation Analysisp. 40
3.6 Reducing the Number of Categories in Categorical Variablesp. 41
3.7 Principal Components Analysisp. 41
Example 2 Breakfast Cerealsp. 42
Principal Componentsp. 45
Normalizing the Datap. 46
Using Principal Components for Classification and Predictionp. 49
Problemsp. 51
4 Evaluating Classification and Predictive Performancep. 53
4.1 Introductionp. 53
4.2 Judging Classification Performancep. 53
Accuracy Measuresp. 53
Cutoff for Classificationp. 56
Performance in Unequal Importance of Classesp. 60
Asymmetric Misclassification Costsp. 61
Oversampling and Asymmetric Costsp. 66
Classification Using a Triage Strategyp. 72
4.3 Evaluating Predictive Performancep. 72
Problemsp. 74
5 Multiple Linear Regressionp. 75
5.1 Introductionp. 75
5.2 Explanatory vs. Predictive Modelingp. 76
5.3 Estimating the Regression Equation and Predictionp. 76
Example: Predicting the Price of Used Toyota Corolla Automobilesp. 77
5.4 Variable Selection in Linear Regressionp. 81
Reducing the Number of Predictorsp. 81
How to Reduce the Number of Predictorsp. 82
Problemsp. 86
6 Three Simple Classification Methodsp. 91
6.1 Introductionp. 91
Example 1 Predicting Fraudulent Financial Reportingp. 91
Example 2 Predicting Delayed Flightsp. 92
6.2 The Naive Rulep. 92
6.3 Naive Bayesp. 93
Conditional Probabilities and Pivot Tablesp. 94
A Practical Difficultyp. 94
A Solution: Naive Bayesp. 95
Advantages and Shortcomings of the naive Bayes Classifierp. 100
6.4 k-Nearest Neighborsp. 103
Example 3 Riding Mowersp. 104
Choosing kp. 105
k-NN for a Quantitative Responsep. 106
Advantages and Shortcomings of k-NN Algorithmsp. 106
Problemsp. 108
7 Classification and Regression Treesp. 111
7.1 Introductionp. 111
7.2 Classification Treesp. 113
7.3 Recursive Partitioningp. 113
7.4 Example 1: Riding Mowersp. 113
Measures of Impurityp. 115
7.5 Evaluating the Performance of a Classification Treep. 120
Example 2 Acceptance of Personal Loanp. 120
7.6 Avoiding Overfittingp. 121
Stopping Tree Growth: CHAIDp. 121
Pruning the Treep. 125
7.7 Classification Rules from Treesp. 130
7.8 Regression Treesp. 130
Predictionp. 130
Measuring Impurityp. 131
Evaluating Performancep. 132
7.9 Advantages, Weaknesses, and Extensionsp. 132
Problemsp. 134
8 Logistic Regressionp. 137
8.1 Introductionp. 137
8.2 The Logistic Regression Modelp. 138
Example: Acceptance of Personal Loanp. 139
Model with a Single Predictorp. 141
Estimating the Logistic Model from Data: Computing Parameter Estimatesp. 143
Interpreting Results in Terms of Oddsp. 144
8.3 Why Linear Regression Is Inappropriate for a Categorical Responsep. 146
8.4 Evaluating Classification Performancep. 148
Variable Selectionp. 148
8.5 Evaluating Goodness of Fitp. 150
8.6 Example of Complete Analysis: Predicting Delayed Flightsp. 153
Data Preprocessingp. 154
Model Fitting and Estimationp. 155
Model Interpretationp. 155
Model Performancep. 155
Goodness of fitp. 157
Variable Selectionp. 158
8.7 Logistic Regression for More Than Two Classesp. 160
Ordinal Classesp. 160
Nominal Classesp. 161
Problemsp. 163
9 Neural Netsp. 167
9.1 Introductionp. 167
9.2 Concept and Structure of a Neural Networkp. 168
9.3 Fitting a Network to Datap. 168
Example 1 Tiny Datasetp. 169
Computing Output of Nodesp. 170
Preprocessing the Datap. 172
Training the Modelp. 172
Example 2 Classifying Accident Severityp. 176
Avoiding overfittingp. 177
Using the Output for Prediction and Classificationp. 181
9.4 Required User Inputp. 181
9.5 Exploring the Relationship Between Predictors and Responsep. 182
9.6 Advantages and Weaknesses of Neural Networksp. 182
Problemsp. 184
10 Discriminant Analysisp. 187
10.1 Introductionp. 187
10.2 Example 1: Riding Mowersp. 187
10.3 Example 2: Personal Loan Acceptancep. 188
10.4 Distance of an Observation from a Classp. 188
10.5 Fisher's Linear Classification Functionsp. 191
10.6 Classification Performance of Discriminant Analysisp. 194
10.7 Prior Probabilitiesp. 195
10.8 Unequal Misclassification Costsp. 195
10.9 Classifying More Than Two Classesp. 196
Example 3 Medical Dispatch to Accident Scenesp. 196
10.10 Advantages and Weaknessesp. 197
Problemsp. 200
11 Association Rulesp. 203
11.1 Introductionp. 203
11.2 Discovering Association Rules in Transaction Databasesp. 203
11.3 Example 1: Synthetic Data on Purchases of Phone Faceplatesp. 204
11.4 Generating Candidate Rulesp. 204
The Apriori Algorithmp. 205
11.5 Selecting Strong Rulesp. 206
Support and Confidencep. 206
Lift Ratiop. 207
Data Formatp. 207
The Process of Rule Selectionp. 209
Interpreting the Resultsp. 210
Statistical Significance of Rulesp. 211
11.6 Example 2: Rules for Similar Book Purchasesp. 212
11.7 Summaryp. 212
Problemsp. 215
12 Cluster Analysisp. 219
12.1 Introductionp. 219
12.2 Example: Public Utilitiesp. 220
12.3 Measuring Distance Between Two Recordsp. 222
Euclidean Distancep. 223
Normalizing Numerical Measurementsp. 223
Other Distance Measures for Numerical Datap. 223
Distance Measures for Categorical Datap. 226
Distance Measures for Mixed Datap. 226
12.4 Measuring Distance Between Two Clustersp. 227
12.5 Hierarchical (Agglomerative) Clusteringp. 228
Minimum Distance (Single Linkage)p. 229
Maximum Distance (Complete Linkage)p. 229
Group Average (Average Linkage)p. 230
Dendrograms: Displaying Clustering Process and Resultsp. 230
Validating Clustersp. 231
Limitations of Hierarchical Clusteringp. 232
12.6 Nonhierarchical Clustering: The k-Means Algorithmp. 233
Initial Partition into k Clustersp. 234
Problemsp. 237
13 Casesp. 241
13.1 Charles Book Clubp. 241
13.2 German Creditp. 250
13.3 Tayko Software Catalogerp. 254
13.4 Segmenting Consumers of Bath Soapp. 258
13.5 Direct-Mail Fundraisingp. 262
13.6 Catalog Cross-Sellingp. 265
13.7 Predicting Bankruptcyp. 267
Referencesp. 271
Indexp. 273