Skip to:Content
|
Bottom
Cover image for Mining the Social Web
Title:
Mining the Social Web
Personal Author:
Edition:
Third Edition
Physical Description:
xxiv, 400 pages : illustrations ; 24 cm.
ISBN:
9781491985045
Added Author:

Available:*

Library
Item Barcode
Call Number
Material Type
Item Category 1
Status
Searching...
30000010376422 QA76.9.D343 R87 2019 Open Access Book Book
Searching...

On Order

Summary

Summary

Mine the rich data tucked away in popular social websites such as Twitter, Facebook, LinkedIn, and Instagram. With the third edition of this popular guide, data scientists, analysts, and programmers will learn how to glean insights from social media--including who's connecting with whom, what they're talking about, and where they're located--using Python code examples, Jupyter notebooks, or Docker containers.

In part one, each standalone chapter focuses on one aspect of the social landscape, including each of the major social sites, as well as web pages, blogs and feeds, mailboxes, GitHub, and a newly added chapter covering Instagram. Part two provides a cookbook with two dozen bite-size recipes for solving particular issues with Twitter.

Get a straightforward synopsis of the social web landscape Use Docker to easily run each chapter's example code, packaged as a Jupyter notebook Adapt and contribute to the code's open source GitHub repository Learn how to employ best-in-class Python 3 tools to slice and dice the data you collect Apply advanced mining techniques such as TFIDF, cosine similarity, collocation analysis, clique detection, and image recognition Build beautiful data visualizations with Python and JavaScript toolkits


Author Notes

Matthew Russell is Chief Technology Officer at Built Technologies, where he leads a team of leaders on a mission to improve the way the world is built. On nights and weekends, he contemplates ultimate reality, practices rugged individualism, and trains for the possibilities of a zombie or robot apocalypse.
Mikhail Klassen is cofounder and Chief Data Scientist at Paladin Al, an aerospace analytics startup based in Montreal, where he works on designing next-generation data-driven adaptive training solutions for pilots using data mining techniques and machine learning.


Table of Contents

Prefacep. xi
Part I A Guided Tour of the Social Web
Preludep. 3
1 Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and Morep. 5
1.1 Overviewp. 5
1.2 Why Is Twitter All the Rage?p. 6
1.3 Exploring Twitter's APIp. 9
1.3.1 Fundamental Twitter Terminologyp. 9
1.3.2 Creating a Twitter API Connectionp. 11
1.3.3 Exploring Trending Topicsp. 16
1.3.4 Searching for Tweetsp. 20
1.4 Analyzing the 140 (or More) Charactersp. 26
1.4.1 Extracting Tweet Entitiesp. 28
1.4.2 Analyzing Tweets and Tweet Entities with Frequency Analysisp. 30
1.4.3 Computing the Lexical Diversity of Tweetsp. 33
1.4.4 Examining Patterns in Retweetsp. 35
1.4.5 Visualizing Frequency Data with Histogramsp. 37
1.5 Closing Remarksp. 42
1.6 Recommended Exercisesp. 43
1.7 Online Resourcesp. 44
2 Mining Facebook: Analyzing Fan Pages, Examining Friendships, and Morep. 45
2.1 Overviewp. 46
2.2 Exploring Facebook's Graph APIp. 46
2.2.1 Understanding the Graph APIp. 48
2.2.2 Understanding the Open Graph Protocolp. 52
2.3 Analyzing Social Graph Connectionsp. 59
2.3.1 Analyzing Facebook Pagesp. 63
2.3.2 Manipulating Data Using pandasp. 74
2.4 Closing Remarksp. 83
2.5 Recommended Exercisesp. 84
2.6 Online Resourcesp. 85
3 Mining Instagram: Computer Vision, Neural Networks, Object Recognition, and Face Detectionp. 87
3.1 Overviewp. 88
3.2 Exploring the Instagram APIp. 89
3.2.1 Making Instagram API Requestsp. 89
3.2.2 Retrieving Your Own Instagram Feedp. 92
3.2.3 Retrieving Media by Hashtagp. 93
3.3 Anatomy of an Instagram Postp. 94
3.4 Crash Course on Artificial Neural Networksp. 97
3.4.1 Training a Neural Network to "Look" at Picturesp. 99
3.4.2 Recognizing Handwritten Digitsp. 101
3.4.3 Object Recognition Within Photos Using Pretrained Neural Networksp. 107
3.5 Applying Neural Networks to Instagram Postsp. 111
3.5.1 Tagging the Contents of an Imagep. 111
3.5.2 Detecting Faces in Imagesp. 112
3.6 Closing Remarksp. 115
3.7 Recommended Exercisesp. 115
3.8 Online Resourcesp. 116
4 Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and Morep. 119
4.1 Overviewp. 120
4.2 Exploring the LinkedIn APIp. 121
4.2.1 Making LinkedIn API Requestsp. 121
4.2.2 Downloading LinkedIn Connections as a CSV Filep. 125
4.3 Crash Course on Clustering Datap. 126
4.3.1 Normalizing Data to Enable Analysisp. 129
4.3.2 Measuring Similarityp. 141
4.3.3 Clustering Algorithmsp. 143
4.4 Closing Remarksp. 159
4.5 Recommended Exercisesp. 160
4.6 Online Resourcesp. 161
5 Mining Text Files: Computing Document Similarity, Extracting Collocations, and Morep. 163
5.1 Overviewp. 164
5.2 Text Filesp. 164
5.3 A Whiz-Bang Introduction to TF-IDFp. 166
5.3.1 Term Frequencyp. 167
5.3.2 Inverse Document Frequencyp. 169
5.3.3 TF-IDFp. 170
5.4 Querying Human Language Data with TF-IDFp. 174
5.4.1 Introducing the Natural Language Toolkitp. 174
5.4.2 Applying TF-IDF to Human Languagep. 177
5.4.3 Finding Similar Documentsp. 179
5.4.4 Analyzing Bigrams in Human Languagep. 187
5.4.5 Reflections on Analyzing Human Language Datap. 197
5.5 Closing Remarksp. 198
5.6 Recommended Exercisesp. 199
5.7 Online Resourcesp. 200
6 Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and Morep. 201
6.1 Overviewp. 202
6.2 Scraping, Parsing, and Crawling the Webp. 203
6.2.1 Breadth-First Search in Web Crawlingp. 206
6.3 Discovering Semantics by Decoding Syntaxp. 210
6.3.1 Natural Language Processing Illustrated Step-by-Stepp. 212
6.3.2 Sentence Detection in Human Language Datap. 216
6.3.3 Document Summarizationp. 220
6.4 Entity-Centric Analysis: A Paradigm Shiftp. 230
6.4.1 Gisting Human Language Datap. 234
6.5 Quality of Analytics for Processing Human Language Datap. 240
6.6 Closing Remarksp. 242
6.7 Recommended Exercisesp. 243
6.8 Online Resourcesp. 244
7 Mining Mailboxes: Analyzing Who's Talking to Whom About What, How Often, and Morep. 247
7.1 Overviewp. 248
7.2 Obtaining and Processing a Mail Corpusp. 249
7.2.1 A Primer on Unix Mailboxesp. 249
7.2.2 Getting the Enron Datap. 254
7.2.3 Converting a Mail Corpus to a Unix Mailboxp. 256
7.2.4 Converting Unix Mailboxes to pandas DataFramesp. 258
7.3 Analyzing the Enron Corpusp. 261
7.3.1 Querying by Date/Time Rangep. 262
7.3.2 Analyzing Patterns in Sender/Recipient Communicationsp. 266
7.3.3 Searching Emails by Keywordsp. 269
7.4 Analyzing Your Own Mail Datap. 271
7.4.1 Accessing Your Gmail with OAuthp. 273
7.4.2 Fetching and Parsing Email Messagesp. 275
7.4.3 Visualizing Patterns in Email with Immersionp. 278
7.5 Closing Remarksp. 278
7.6 Recommended Exercisesp. 279
7.7 Online Resourcesp. 280
8 Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and Morep. 283
8.1 Overviewp. 284
8.2 Exploring GitHub's APIp. 285
8.2.1 Creating a GitHub API Connectionp. 286
8.2.2 Making GitHub API Requestsp. 290
8.3 Modeling Data with Property Graphsp. 292
8.4 Analyzing GitHub Interest Graphsp. 296
8.4.1 Seeding an Interest Graphp. 296
8.4.2 Computing Graph Centrality Measuresp. 300
8.4.3 Extending the Interest Graph with "Follows" Edges for Usersp. 303
8.4.4 Using Nodes as Pivots for More Efficient Queriesp. 315
8.4.5 Visualizing Interest Graphsp. 320
8.5 Closing Remarksp. 322
8.6 Recommended Exercisesp. 323
8.7 Online Resourcesp. 324
Part II Twitter Cookbook
9 Twitter Cookbookp. 329
9.1 Accessing Twitter's API for Development Purposesp. 330
9.2 Doing the OAuth Dance to Access Twitter's API for Production Purposesp. 332
9.3 Discovering the Trending Topicsp. 336
9.4 Searching for Tweetsp. 337
9.5 Constructing Convenient Function Callsp. 339
9.6 Saving and Restoring JSON Data with Text Filesp. 340
9.7 Saving and Accessing JSON Data with MongoDBp. 341
9.8 Sampling the Twitter Firehose with the Streaming APIp. 344
9.9 Collecting Time-Series Datap. 346
9.10 Extracting Tweet Entitiesp. 347
9.11 Finding the Most Popular Tweets in a Collection of Tweetsp. 349
9.12 Finding the Most Popular Tweet Entities in a Collection of Tweetsp. 351
9.13 Tabulating Frequency Analysisp. 352
9.14 Finding Users Who Have Retweeted a Statusp. 353
9.15 Extracting a Retweet's Attributionp. 355
9.16 Making Robust Twitter Requestsp. 357
9.17 Resolving User Profile Informationp. 359
9.18 Extracting Tweet Entities from Arbitrary Textp. 361
9.19 Getting All Friends or Followers for a Userp. 361
9.20 Analyzing a User's Friends and Followersp. 364
9.21 Harvesting a User's Tweetsp. 365
9.22 Crawling a Friendship Graphp. 367
9.23 Analyzing Tweet Contentp. 369
9.24 Summarizing Link Targetsp. 371
9.25 Analyzing a User's Favorite Tweetsp. 374
9.26 Closing Remarksp. 375
9.27 Recommended Exercisesp. 376
9.28 Online Resourcesp. 377
Part III Appendixes
A Information About This Book's Virtual Machine Experiencep. 381
B OAuth Primerp. 383
C Python and Jupyter Notebook Tips and Tricksp. 389
Indexp. 391
Go to:Top of Page