Natural Language Processing in Action : Understanding, analyzing, and generating text with Python

Select an Action

Place Hold(s)
Add to My Lists
Email
Print

Title:

Personal Author:

Lane, Hobson, author

Physical Description:

xxix, 512 pages : illustrations ; 24 cm.

ISBN:

9781617294631

Abstract:

Modern NLP techniques based on machine learning radically improve the ability of software to recognise patterns, use context to infer meaning and accurately discern intent from poorly-structured text. Here, readers explore carefully chosen examples and expand their machine's knowledge which they can then apply to a range of challenges.

Subject Term:

Natural language processing (Computer science)

Python (Computer program language)

Added Author:

Howard, Cole,

Hapke, Hannes Max,

Available:*

Library	Item Barcode	Call Number	Material Type	Item Category 1	Status
Searching... PSZ JB	30000010371668	QA76.9.N38 L36 2019	Open Access Book	Book	Searching... Unknown

On Order

Summary

Description

Modern NLP techniques based on machine learning radically improve the ability of software to recognize patterns, use context to infer meaning, and accurately discern intent from poorly-structured text. In Natural Language Processing in Action , readers explore carefully chosen examples and expand their machine's knowledge which they can then apply to a range of challenges.

Key Features

* Easy-to-follow

* Clear examples

* Hands-on-guide

Audience

A basic understanding of machine learning and some experience with a modern programming language such as Python, Java, C++, or

JavaScript will be helpful.

About the technology

Natural Language Processing (NLP) is the discipline of teaching computers to read more like people, and readers can see examples of it in everything from chatbots to the speech-recognition software on their phone.

Hobson Lane has more than 15 years of experience building autonomous systems that make important decisions on behalf of humans.

Hannes Hapke is an Electrical Engineer turned Data Scientist with experience in deep learning.

Cole Howard is a carpenter and writer turned Deep Learning expert.

Author Notes

Hobson Lane, Hannes Max Hapke, and Cole Howard are experienced NLP engineers who use these techniques in production.

Foreword	p. xiii
Preface	p. xv
Acknowledgments	p. xxi
About this book	p. xxiv
About the authors	p. xxvii
About the cover illustration	p. xxix
Part 1 Wordy Machines	p. 1
1 Packets of thought (NLP overview)	p. 3
1.1 Natural language vs. programming language	p. 4
1.2 The magic	p. 4
Machines that converse	p. 5
The math	p. 6
1.3 Practical applications	p. 8
1.4 Language through a computer's "eyes"	p. 9
The language of locks	p. 10
Regular expressions	p. 11
A simple chatbot	p. 12
Another way	p. 16
1.5 A brief overflight of hyperspace	p. 19
1.6 Word order and grammar	p. 21
1.7 A chatbot natural language pipeline	p. 22
1.8 Processing in depth	p. 25
1.9 Natural language IQ	p. 27
2 Build your vocabulary (word tokenization)	p. 30
2.1 Challenges (a preview of stemming)	p. 32
2.2 Building your vocabulary with a tokenizer	p. 33
Dot product	p. 41
Measuring bag-of words overlap	p. 42
A token improvement	p. 43
Extending your vocabulary with n-grams	p. 48
Normalizing your vocabulary	p. 54
2.3 Sentiment	p. 62
VADER-A rule-based sentiment analyzer	p. 64
Naive Bayes	p. 65
3 Math with words (TF-IDF vectors)	p. 70
3.1 Bag of words	p. 71
3.2 Vectorizing	p. 76
Vector spaces	p. 79
3.3 Zipf's Law	p. 83
3.4 Topic modeling	p. 86
Return of Zipf	p. 89
Relevance ranking	p. 90
Tools	p. 93
Alternatives	p. 93
Okapi BM25	p. 95
What's next	p. 95
4 Finding meaning in word counts (semantic analysis)	p. 97
4.1 From word counts to topic scores	p. 98
TF-IDF vectors and lemmatization	p. 99
Topic vectors	p. 99
Thought experiment	p. 101
An algorithm for scoring topics	p. 105
An LDA classifier	p. 107
4.2 Latent semantic analysis	p. 111
Your thought experiment made real	p. 113
4.3 Singular value decomposition	p. 116
U-left singular vectors	p. 118
S-singular values	p. 119
VT-right singular vectors	p. 120
SVD matrix orientation	p. 120
Truncating the topics	p. 121
4.4 Principal component analysis	p. 123
PCA on 3D vectors	p. 125
Stop horsing around and gel back to Nil1	p. 126
Using PCA for SMS message semantic analysis	p. 128
Using truncated SVD for SMS message semantic analysis	p. 130
How well does LSA work for spam classification?	p. 131
4.5 Latent Dirichlet allocation (LDiA)	p. 134
The LDiA idea	p. 135
LDiA topic model for SMS messages	p. 137
LDiA + LDA = spam classifier	p. 140
A fairer comparison: 32 LDiA topics	p. 142
4.6 Distance and similarity	p. 143
4.7 Steering with feedback	p. 146
Linear discriminant analysis	p. 147
4.8 Topic vector power	p. 148
Semantic search	p. 150
Improvements	p. 152
Part 2 Deeper Learning (Neural Networks)	p. 153
5 Baby steps with neural networks (perceptrons and backpropagation)	p. 155
5.1 Neural networks, the ingredient list	p. 156
Perceptron	p. 157
A numerical perceptron	p. 157
Detour through bias	p. 158
Let's go skiing-the error surface	p. 172
Off the chair lift, onto the slope	p. 173
Let's shake things up a bit	p. 174
Keras: neural networks in Python	p. 175
Onward and deepward	p. 179
Normalization: input, with style	p. 179
6 Reasoning with word vectors (Word2vec)	p. 181
6.1 Semantic queries and analogies	p. 182
Analogy questions	p. 183
6.2 Word vectors	p. 184
Vector-oriented reasoning	p. 187
How to compute Word2vec representations	p. 191
How to use thegensim.word2vec module	p. 200
How to generate your own word vector representations	p. 202
Word2vec vs. GloVe (Global Vectors)	p. 205
FastText	p. 205
Word2vec vs. LSA	p. 206
Visualizing word relationships	p. 207
Unnatural words	p. 214
Document similarity with Doc2vec	p. 215
7 Getting words in order with convolutional neural networks (CNNs)	p. 218
7.1 Learning meaning	p. 220
7.2 Toolkit	p. 221
7.3 Convolutional neural nets	p. 222
Building blocks	p. 223
Step size (stride)	p. 224
Filter composition	p. 224
Padding	p. 226
Learning	p. 228
7.4 Narrow windows indeed	p. 228
Implementation in Keras: prepping the data	p. 230
Convolutional neural network architecture	p. 235
Pooling	p. 236
Dropout	p. 238
The cherry on the sundae	p. 239
Let's get to learning (training)	p. 241
Using the model in a pipeline	p. 243
Where do you go from here?	p. 244
8 Loopy (recurrent) neural networks (RNNs)	p. 247
8.1 Remembering with recurrent networks	p. 250
Backpropagation through time	p. 255
When do we update what?	p. 257
Recap	p. 259
There's always a catch	p. 259
Recurrent neural net with Keras	p. 260
8.2 Putting things together	p. 264
8.3 Let's get to learning our past selves	p. 266
8.4 Hyperparameters	p. 267
8.5 Predicting	p. 269
Statefulness	p. 270
Two-way street	p. 271
What is this thing?	p. 272
9 Improving retention with long short-term memory networks	p. 274
9.1 LSTM	p. 275
Backpropagation through time	p. 284
Where does the rubber hit the road?	p. 287
Dirty data	p. 288
Back to the dirty data	p. 291
Words are hard Letters are easier	p. 292
My turn to chat	p. 298
My turn to speak more clearly	p. 300
Learned how to say, but not yet what	p. 308
Other kinds of memory	p. 308
Going deeper	p. 309
10 Sequence-to-sequence models and attention	p. 311
10.1 Encoder-decoder architecture	p. 312
Decoding thought	p. 313
Look familiar?	p. 315
Sequence-to-sequence conversation	p. 316
LSTM review	p. 317
10.2 Assembling a sequence-to-sequence pipeline	p. 318
Preparing your dataset for the sequence-to-sequence training	p. 318
Sequence-to-sequence model in Keras	p. 320
Sequence encoder	p. 320
Thought decoder	p. 322
Assembling the sequence-to-sequence network	p. 323
10.3 Training the sequence-Co-sequence network	p. 324
Generate output sequences	p. 325
10.4 Building a chatbot using sequence-to-sequence network?	p. 326
Preparing the corpus for your training	p. 326
Building your character dictionary	p. 327
Generate one-hot encoded training sets	p. 328
Train your sequence-to-sequence chatbot	p. 329
Assemble the model for sequence generation	p. 330
Predicting a sequence	p. 330
Generating a response	p. 331
Converse with your chatbot	p. 331
10.5 Enhancements	p. 332
Reduce training complexity with bucketing	p. 332
Paying attention	p. 333
10.6 In the real world	p. 334
Part 3 Getting Real (Real-World NLP Chalenges)	p. 337
11 Information extraction (named entity extraction and question answering)	p. 339
11.1 Named entities and relations	p. 339
A knowledge base	p. 340
Information extraction	p. 343
11.2 Regular patterns	p. 343
Regular expressions	p. 344
Information extraction as NIL feature extraction	p. 345
11.3 Information worth extracting	p. 346
Extracting GPS locations	p. 347
Extracting dates	p. 347
11.4 Extracting relationships (relations)	p. 352
Part-of-speech (POS) tagging	p. 353
Entity name normalization	p. 357
Relation normalization and extraction	p. 358
Word patterns	p. 358
Segmentation	p. 359
Why won't split('.!?') work?	p. 360
Sentence segmentation with regular expressions	p. 361
11.5 In the real world	p. 363
Getting chatty (dialog engines)	p. 365
12.1 Language skill	p. 366
Modern approaches	p. 367
A hybrid approach	p. 373
12.2 Pattern-matching approach	p. 373
A pattern-matching chatbot with AIML	p. 375
A net-work view of pattern matching	p. 381
12.3 Grounding	p. 382
12.4 Retrieval (search)	p. 384
The context challenge	p. 384
Example retrieval-based chatbot	p. 386
A search-based chatbot	p. 389
12.5 Generative models	p. 391
Chat about NLPIA	p. 392
Pros and cons of each approach	p. 394
12.6 Four-wheel drive	p. 395
The Will to succeed	p. 395
12.7 Design process	p. 396
12.8 Trickery	p. 399
Ask questions with predictable answers	p. 399
Be entertaining	p. 399
When all else fails, search	p. 400
Being popular	p. 400
Be a connector	p. 400
Getting emotional	p. 400
12.9 In the real world	p. 401
13 Scaling up (optimization, parallelization, and batch processing)	p. 403
13.1 Too much of a good thing (data)	p. 404
13.2 Optimizing NLP algorithms	p. 404
Indexing	p. 405
Advanced, indexing	p. 406
Advanced indexing with Annoy	p. 408
Why use approximate indexes at all?	p. 412
An indexing workaround: discrelizing	p. 413
13.3 Constant RAM algorithms	p. 414
Gensim	p. 414
Graph computing	p. 415
13.4 Parallelizing your NLP computations	p. 416
Training NLP models on GPUs	p. 416
Renting vs. buying	p. 417
GPU rental options	p. 418
Tensor processing units	p. 419
13.5 Reducing the memory footprint during model training	p. 419
13.6 Gaining model insights with TensorBoard	p. 422
How to visualize word embeddings	p. 423
Appendix A Your NLP tools	p. 427
Appendix B Playful Python and regular expressions	p. 434
Appendix C Vectors and matrices (linear algebra fundamentals)	p. 440
Appendix D Machine learning tools and techniques	p. 446
Appendix E Setting up your AWS GPU	p. 459
Appendix F Locality sensitive hashing	p. 473
Resources	p. 481
Glossary	p. 490
Index	p. 497

Available:*

On Order

Summary

Summary

Author Notes

Table of Contents