Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010337167 | HD30.25 N48 2014 | Open Access Book | Book | Searching... |
Searching... | 33000000016496 | HD30.25 N48 2014 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Whether you are brand new to data mining or working on your tenth predictive analytics project, Commercial Data Mining will be there for you as an accessible reference outlining the entire process and related themes. In this book, you'll learn that your organization does not need a huge volume of data or a Fortune 500 budget to generate business using existing information assets. Expert author David Nettleton guides you through the process from beginning to end and covers everything from business objectives to data sources, and selection to analysis and predictive modeling.
Commercial Data Mining includes case studies and practical examples from Nettleton's more than 20 years of commercial experience. Real-world cases covering customer loyalty, cross-selling, and audience prediction in industries including insurance, banking, and media illustrate the concepts and techniques explained throughout the book.
Author Notes
David F. Nettleton has more than 25 years of experience in IT system development, specializing in databases and data analysis. He has a Bachelor of Science degree in Computer Science, Master of Science degree in Computer Software and Systems Design and a Ph.D. in Artificial Intelligence. He has worked for IBM as a Business Intelligence Consultant, among other companies. In 1995 he founded his own consultancy dedicated to commercial data analysis projects, working in the Banking, Insurance, Media, Industry and Health Sectors. He has published over 40 articles and papers in journals, national and international congresses and magazines, and has given many presentations in conferences and workshops. He is currently a contract researcher at the Universitat Pompeu Fabra, Barcelona, Spain and at the IIIA-CSIC, Spain, specializing in data mining applied to online social networks and data privacy. Dr. Nettleton was born in England and lives in Barcelona, Spain since 1988.
Table of Contents
Acknowledgments | p. xi |
1 Introduction | p. 1 |
2 Business Objectives | p. 7 |
Introduction | p. 7 |
Criteria for Choosing a Viable Project | p. 8 |
Evaluation of Potential Commercial Data Analysis Projects - General Considerations | p. 8 |
Evaluation of Viability in Terms of Available Data - Specific Considerations | p. 8 |
Factors That Influence Project Benefits | p. 9 |
Factors That Influence Project Costs | p. 10 |
Example 1: Customer Call Center - Objective: IT Support for Customer Reclamations | p. 10 |
Overall Evaluation of the Cost and Benefit of Mr. Strong's Project | p. 12 |
Example 2: Online Music App - Objective: Determine Effectiveness of Advertising for Mobile Device Apps | p. 13 |
Overall Evaluation of the Cost and Benefit of Melody-online's Project | p. 14 |
Summary | p. 15 |
Further Reading | p. 16 |
3 Incorporating Various Sources of Data and Information | p. 17 |
Introduction | p. 17 |
Data about a Business's Products and Services | p. 19 |
Surveys and Questionnaires | p. 20 |
Examples of Survey and Questionnaire Forms | p. 21 |
Surveys and Questionnaires: Data Table Population | p. 24 |
Issues When Designing Forms | p. 24 |
Loyalty Card/Customer Card | p. 26 |
Registration Form for a Customer Card | p. 27 |
Customer Card Registrations: Data Table Population | p. 30 |
Transactional Analysis of Customer Card Usage | p. 36 |
Demographic Data | p. 38 |
The Census: Census Data, United States, 2010 | p. 39 |
Macro-Economic Data | p. 40 |
Data about Competitors | p. 43 |
Financial Markets Data: Stocks, Shares, Commodities, and Investments | p. 45 |
4 Data Representation | p. 49 |
Introduction | p. 49 |
Basic Data Representation | p. 49 |
Basic Data Types | p. 49 |
Representation, Comparison, and Processing of Variables of Different Types | p. 51 |
Normalization of the Values of a Variable | p. 56 |
Distribution of the Values of a Variable | p. 57 |
Atypical Values - Outliers | p. 58 |
Advanced Data Representation | p. 61 |
Hierarchical Data | p. 61 |
Semantic Networks | p. 62 |
Graph Data | p. 63 |
Fuzzy Data | p. 64 |
5 Data Quality | p. 67 |
Introduction | p. 67 |
Examples of Typical Data Problems | p. 69 |
Content Errors in the Data | p. 70 |
Relevance and Reliability | p. 71 |
Quantitative Evaluation of the Data Quality | p. 73 |
Data Extraction and Data Quality - Common Mistakes and How to Avoid Them | p. 74 |
Data Extraction | p. 74 |
Derived Data | p. 77 |
Summary of Data Extraction Example t | p. 77 |
How Data Entry and Data Creation May Affect Data Quality | p. 78 |
6 Selection of Variables and Factor Derivation | p. 79 |
Introduction | p. 79 |
Selection from the Available Data | p. 80 |
Statistical Techniques for Evaluating a Set of Input Variables | p. 81 |
Summary of the Approach of Selecting from the Available Data | p. 87 |
Reverse Engineering: Selection by Considering the Desired Result | p. 87 |
Statistical Techniques for Evaluating and Selecting Input Variables For a Specific Business Objective | p. 87 |
Transforming Numerical Variables into Ordinal Categorical Variables | p. 90 |
Customer Segmentation | p. 92 |
Summary of the Reverse Engineering Approach | p. 99 |
Data Mining Approaches to Selecting Variables | p. 99 |
Rule Induction | p. 99 |
Neural Networks | p. 100 |
Clustering | p. 101 |
Packaged Solutions: Preselecting Specific Variables for a Given Business Sector | p. 101 |
The FAMS (Fraud and Abuse Management) System | p. 103 |
Summary | p. 104 |
7 Data Sampling and Partitioning | p. 105 |
Introduction | p. 105 |
Sampling for Data Reduction | p. 106 |
Partitioning the Data Based on Business Criteria | p. 111 |
Issues Related to Sampling | p. 115 |
Sampling versus Big Data | p. 116 |
8 Data Analysis | p. 119 |
Introduction | p. 119 |
Visualization | p. 120 |
Associations | p. 121 |
Clustering and Segmentation | p. 122 |
Segmentation and Visualization | p. 124 |
Analysis of Transactional Sequences | p. 129 |
Analysis of Time Series | p. 130 |
Bank Current Account: Time Series Data Profiles | p. 131 |
Typical Mistakes when Performing Data Analysis and Interpreting Results | p. 134 |
9 Data Modeling | p. 137 |
Introduction | p. 137 |
Modeling Concepts and Issues | p. 137 |
Supervised and Unsupervised Learning | p. 137 |
Cross-Validation | p. 138 |
Evaluating the Results of Data Models - Measuring Precision | p. 139 |
Neural Networks | p. 141 |
Predictive Neural Networks | p. 141 |
Kohonen Neural Network for Clustering | p. 144 |
Classification: Rule/Tree Induction | p. 144 |
The ID3 Decision Tree Induction Algorithm | p. 146 |
The C4.5 Decision Tree Induction Algorithm | p. 147 |
The C5.0 Decision Tree Induction Algorithm | p. 148 |
Traditional Statistical Models | p. 149 |
Regression Techniques | p. 149 |
Summary of the use of regression techniques | p. 151 |
K-means | p. 151 |
Other Methods and Techniques for Creating Predictive Models | p. 152 |
Applying the Models to the Data | p. 153 |
Simulation Models - "What If?" | p. 154 |
Summary of Modeling | p. 156 |
10 Deployment Systems: From Query Reporting to EIS and Expert Systems | p. 159 |
Introduction | p. 159 |
Query and Report Generation | p. 159 |
Query and Reporting Systems | p. 163 |
Executive Information Systems | p. 164 |
EIS Interface for a "What If" Scenario Modeler | p. 164 |
Executive Information Systems (EIS) | p. 166 |
Expert Systems | p. 167 |
Case-Based Systems | p. 169 |
Summary | p. 170 |
11 Text Analysis | p. 171 |
Basic Analysis of Textual Information | p. 171 |
Advanced Analysis of Textual Information | p. 172 |
Keyword Definition and Information Retrieval | p. 173 |
Identification of Names and Personal Information of Individuals | p. 173 |
Identifying Blocks of Interesting Text | p. 174 |
Information Retrieval Concepts | p. 175 |
Assessing Sentiment on Social Media | p. 176 |
Commercial Text Mining Products | p. 178 |
12 Data Mining from Relationally Structured Data, Marts, and Warehouses | p. 181 |
Introduction | p. 181 |
Data Warehouse and Data Marts | p. 182 |
Creating a File or Table for Data Mining | p. 186 |
13 CRM - Customer Relationship Management and Analysis | p. 195 |
Introduction | p. 195 |
CRM Metrics and Data Collection | p. 195 |
Customer Life Cycle | p. 196 |
Example: Retail Bank | p. 198 |
Integrated CRM Systems | p. 200 |
CRM Application Software | p. 200 |
Customer Satisfaction | p. 201 |
Example CRM Application | p. 201 |
14 Analysis of Data on the Internet I - Website Analysis and Internet Search (Online Chapter) | p. 209 |
15 Analysis of Data on the Internet II - Search Experience Analysis (Online Chapter) | p. 211 |
16 Analysis of Data on the Internet III - Online Social Network Analysis (Online Chapter) | p. 213 |
17 Analysis of Data on the Internet IV - Search Trend Analysis over Time (Online Chapter) | p. 215 |
18 Data Privacy and Privacy-Preserving Data Publishing | p. 217 |
Introduction | p. 217 |
Popular Applications and Data Privacy | p. 218 |
Legal Aspects - Responsibility and Limits | p. 220 |
Privacy-Preserving Data Publishing | p. 221 |
Privacy Concepts | p. 221 |
Anonymization Techniques | p. 223 |
Document Sanitization | p. 226 |
19 Creating an Environment for Commercial Data Analysis | p. 229 |
Introduction | p. 229 |
Integrated Commercial Data Analysis Tools | p. 229 |
Creating an Ad Hoc/Low-Cost Environment for Commercial Data Analysis | p. 233 |
20 Summary | p. 239 |
Appendix: Case Studies | p. 241 |
Case Study 1 Customer Loyalty at an Insurance Company | p. 241 |
Introduction | p. 241 |
Definition of the Operational and Informational Data of Interest | p. 242 |
Data Extraction and Creation of Files for Analysis | p. 242 |
Data Exploration | p. 243 |
Modeling Phase | p. 248 |
Case Study 2 Cross-Selling a Pension Plan at a Retail Bank | p. 251 |
Introduction | p. 252 |
Data Definition | p. 252 |
Data Analysis | p. 255 |
Model Generation | p. 259 |
Results and Conclusions | p. 262 |
Example Weka Screens: Data Processing, Analysis, and Modeling | p. 262 |
Case Study 3 Audience Prediction for a Television Channel | p. 268 |
Introduction | p. 268 |
Data Definition | p. 269 |
Data Analysis | p. 270 |
Audience Prediction by Program | p. 272 |
Audience Prediction for Publicity Blocks | p. 273 |
Glossary (Online) | p. 277 |
Bibliography | p. 279 |
Index | p. 281 |