Machine Learning with Spark and Python : Essential Techniques for Predictive Analytic

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Personal Author:

Bowles, Michael, author

Edition:

Second Edition

Physical Description:

xxvii, 340 pages : illustrations ; 24 cm.

ISBN:

9781119561934

Abstract:

Machine Learning with Spark and Python Essential Techniques for Predictive Analytics, Second Edition simplifies ML for practical uses by focusing on two key algorithms. This new second edition improves with the addition of Spark―a ML framework from the Apache foundation. By implementing Spark, machine learning students can easily process much large data sets and call the spark algorithms using ordinary Python code. Machine Learning with Spark and Python focuses on two algorithm families (linear methods and ensemble methods) that effectively predict outcomes. This type of problem covers many use cases such as what ad to place on a web page, predicting prices in securities markets, or detecting credit card fraud. The focus on two families gives enough room for full descriptions of the mechanisms at work in the algorithms. Then the code examples serve to illustrate the workings of the machinery with specific hackable code.

Subject Term:

Machine learning

SPARK (Computer program language)

Python (Computer program language)

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010371675	Q325.5 B69 2019	Open Access Book	Book	Searching... Unknown

On Order

Summary

Machine Learning with Spark and Python Essential Techniques for Predictive Analytics, Second Edition simplifies ML for practical uses by focusing on two key algorithms. This new second edition improves with the addition of Spark--a ML framework from the Apache foundation. By implementing Spark, machine learning students can easily process much large data sets and call the spark algorithms using ordinary Python code.

Machine Learning with Spark and Python focuses on two algorithm families (linear methods and ensemble methods) that effectively predict outcomes. This type of problem covers many use cases such as what ad to place on a web page, predicting prices in securities markets, or detecting credit card fraud. The focus on two families gives enough room for full descriptions of the mechanisms at work in the algorithms. Then the code examples serve to illustrate the workings of the machinery with specific hackable code.

Author Notes

Michael Bowles teaches machine learning at UC Berkeley, University of New Haven and Hacker Dojo in Silicon Valley, consults on machine learning projects, and is involved in a number of startups in such areas as semi conductor inspection, drug design and optimization and trading in the financial markets. Following an assistant professorship at MIT, Michael went on to found and run two Silicon Valley startups, both of which went public. His courses are always popular and receive great feedback from participants.

Introduction	p. xxi
Chapter 1 The Two Essential Algorithms for Making Predictions	p. 1
Why Are These Two Algorithms So Useful?	p. 2
What Are Penalized Regression Methods?	p. 7
What Are Ensemble Methods?	p. 9
How to Decide Which Algorithm to Use	p. 11
The Process Steps for Building a Predictive Model	p. 13
Framing a Machine Learning Problem	p. 15
Feature Extraction and Feature Engineering	p. 17
Determining Performance of a Trained Model	p. 18
Chapter Contents and Dependencies	p. 18
Summary	p. 20
Chapter 2 Understand the Problem by Understanding the Data	p. 23
The Anatomy of a New Problem	p. 24
Different Types of Attributes and Labels Drive Modeling Choices	p. 26
Things to Notice about Your New Data Set	p. 27
Classification Problems: Detecting Unexploded Mines Using Sonar	p. 28
Physical Characteristics of the Rocks Versus Mines Data Set	p. 29
Statistical Summaries of the Rocks Versus Mines Data Set	p. 32
Visualization of Outliers Using a Quantile-Quantile Plot	p. 34
Statistical Characterization of Categorical Attributes	p. 35
How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set	p. 36
Visualizing Properties of the Rocks Versus Mines Data Set	p. 39
Visualizing with Parallel Coordinates Plots	p. 39
Visualizing Interrelationships between Attributes and Labels	p. 41
Visualizing Attribute and Label Correlations Using a Heat Map	p. 48
Summarizing the Process for Understanding the Rocks Versus Mines Data Set	p. 50
Real-Valued Predictions with Factor Variables: How Old Is Your Abalone?	p. 50
Parallel Coordinates for Regression Problems-Visualize Variable Relationships for the Abalone Problem	p. 55
How to Use a Correlation Heat Map for Regression-Visualize Pair-Wise Correlations for the Abalone Problem	p. 59
Real-Valued Predictions Using Real-Valued Attributes:
Calculate How Your Wine Tastes	p. 61
Multiclass Classification Problem: What Type of Glass Is That?	p. 67
Using PySpark to Understand Large Data Sets	p. 72
Summary	p. 75
Chapter 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data	p. 77
The Basic Problem: Understanding Function Approximation	p. 78
Working with Training Data	p. 79
Assessing Performance of Predictive Models	p. 81
Factors Driving Algorithm Choices and Performance-Complexity and Data	p. 82
Contrast between a Simple Problem and a Complex Problem	p. 82
Contrast between a Simple Model and a Complex Model	p. 85
Factors Driving Predictive Algorithm Performance	p. 89
Choosing an Algorithm: Linear or Nonlinear?	p. 90
Measuring the Performance of Predictive Models	p. 91
Performance Measures for Different Types of Problems	p. 91
Simulating Performance of Deployed Models	p. 105
Achieving Harmony between Model and Data	p. 107
Choosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size	p. 107
Using Forward Stepwise Regression to Control Overfitting	p. 109
Evaluating and Understanding Your Predictive Model	p. 114
Control Overfitting by Penalizing Regression Coefficients-Ridge Regression	p. 116
Using PySpark for Training Penalized Regression Models on Extremely Large Data Sets	p. 124
Summary	p. 127
Chapter 4 Penalized Linear Regression	p. 129
Why Penalized Linear Regression Methods Are So Useful	p. 130
Extremely Fast Coefficient Estimation	p. 130
Variable Importance Information	p. 131
Extremely Fast Evaluation When Deployed	p. 131
Reliable Performance	p. 131
Sparse Solutions	p. 132
Problem May Require Linear Model	p. 132
When to Use Ensemble Methods	p. 132
Penalized Linear Regression: Regulating Linear Regression for Optimum Performance	p. 132
Training Linear Models: Minimizing Errors and More	p. 135
Adding a Coefficient Penalty to the OLS Formulation	p. 136
Other Useful Coefficient Penalties-Manhattan and ElasticNet	p. 137
Why Lasso Penalty Leads to Sparse Coefficient Vectors	p. 138
ElasticNet Penalty Includes Both Lasso and Ridge	p. 140
Solving the Penalized Linear Regression Problem	p. 141
Understanding Least Angle Regression and Its Relationship to Forward Stepwise Regression	p. 141
How LARS Generates Hundreds of Models of Varying Complexity	p. 145
Choosing the Best Model from the Hundreds LARS Generates	p. 147
Using Glmnet: Very Fast and Very General	p. 152
Comparison of the Mechanics of Glmnet and LARS Algorithms	p. 153
Initializing and Iterating the Glmnet Algorithm	p. 153
Extension of Linear Regression to Classification Problems	p. 157
Solving Classification Problems with Penalized Regression	p. 157
Working with Classification Problems Having More Than Two Outcomes	p. 161
Understanding Basis Expansion: Using Linear Methods on Nonlinear Problems	p. 161
Incorporating Non-Numeric Attributes into Linear Methods	p. 163
Summary	p. 166
Chapter 5 Building Predictive Models Using Penalized Linear Methods	p. 169
Python Packages for Penalized Linear Regression	p. 170
Multivariable Regression: Predicting Wine Taste	p. 171
Building and Testing a Model to Predict Wine Taste	p. 172
Training on the Whole Data Set before Deployment	p. 175
Basis Expansion: Improving Performance by Creating New Variables from Old Ones	p. 179
Binary Classification: Using Penalized Linear Regression to Detect Unexploded Mines	p. 182
Build a Rocks Versus Mines Classifier for Deployment	p. 191
Multiclass Classification: Classifying Crime Scene Glass Samples	p. 200
Linear Regression and Classification Using PySpark	p. 203
Using PySpark to Predict Wine Taste	p. 204
Logistic Regression with PySpark: Rocks Versus Mines	p. 208
Incorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings	p. 213

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents