Available:*
Library | Item Barcode | Call Number | Material Type | Item Category 1 | Status |
---|---|---|---|---|---|
Searching... | 30000010371668 | QA76.9.N38 L36 2019 | Open Access Book | Book | Searching... |
On Order
Summary
Summary
Description
Modern NLP techniques based on machine learning radically improve the ability of software to recognize patterns, use context to infer meaning, and accurately discern intent from poorly-structured text. In Natural Language Processing in Action , readers explore carefully chosen examples and expand their machine's knowledge which they can then apply to a range of challenges.
Key Features
* Easy-to-follow
* Clear examples
* Hands-on-guide
Audience
A basic understanding of machine learning and some experience with a modern programming language such as Python, Java, C++, or
JavaScript will be helpful.
About the technology
Natural Language Processing (NLP) is the discipline of teaching computers to read more like people, and readers can see examples of it in everything from chatbots to the speech-recognition software on their phone.
Hobson Lane has more than 15 years of experience building autonomous systems that make important decisions on behalf of humans.
Hannes Hapke is an Electrical Engineer turned Data Scientist with experience in deep learning.
Cole Howard is a carpenter and writer turned Deep Learning expert.
Author Notes
Hobson Lane, Hannes Max Hapke, and Cole Howard are experienced NLP engineers who use these techniques in production.
Table of Contents
Foreword | p. xiii |
Preface | p. xv |
Acknowledgments | p. xxi |
About this book | p. xxiv |
About the authors | p. xxvii |
About the cover illustration | p. xxix |
Part 1 Wordy Machines | p. 1 |
1 Packets of thought (NLP overview) | p. 3 |
1.1 Natural language vs. programming language | p. 4 |
1.2 The magic | p. 4 |
Machines that converse | p. 5 |
The math | p. 6 |
1.3 Practical applications | p. 8 |
1.4 Language through a computer's "eyes" | p. 9 |
The language of locks | p. 10 |
Regular expressions | p. 11 |
A simple chatbot | p. 12 |
Another way | p. 16 |
1.5 A brief overflight of hyperspace | p. 19 |
1.6 Word order and grammar | p. 21 |
1.7 A chatbot natural language pipeline | p. 22 |
1.8 Processing in depth | p. 25 |
1.9 Natural language IQ | p. 27 |
2 Build your vocabulary (word tokenization) | p. 30 |
2.1 Challenges (a preview of stemming) | p. 32 |
2.2 Building your vocabulary with a tokenizer | p. 33 |
Dot product | p. 41 |
Measuring bag-of words overlap | p. 42 |
A token improvement | p. 43 |
Extending your vocabulary with n-grams | p. 48 |
Normalizing your vocabulary | p. 54 |
2.3 Sentiment | p. 62 |
VADER-A rule-based sentiment analyzer | p. 64 |
Naive Bayes | p. 65 |
3 Math with words (TF-IDF vectors) | p. 70 |
3.1 Bag of words | p. 71 |
3.2 Vectorizing | p. 76 |
Vector spaces | p. 79 |
3.3 Zipf's Law | p. 83 |
3.4 Topic modeling | p. 86 |
Return of Zipf | p. 89 |
Relevance ranking | p. 90 |
Tools | p. 93 |
Alternatives | p. 93 |
Okapi BM25 | p. 95 |
What's next | p. 95 |
4 Finding meaning in word counts (semantic analysis) | p. 97 |
4.1 From word counts to topic scores | p. 98 |
TF-IDF vectors and lemmatization | p. 99 |
Topic vectors | p. 99 |
Thought experiment | p. 101 |
An algorithm for scoring topics | p. 105 |
An LDA classifier | p. 107 |
4.2 Latent semantic analysis | p. 111 |
Your thought experiment made real | p. 113 |
4.3 Singular value decomposition | p. 116 |
U-left singular vectors | p. 118 |
S-singular values | p. 119 |
VT-right singular vectors | p. 120 |
SVD matrix orientation | p. 120 |
Truncating the topics | p. 121 |
4.4 Principal component analysis | p. 123 |
PCA on 3D vectors | p. 125 |
Stop horsing around and gel back to Nil1 | p. 126 |
Using PCA for SMS message semantic analysis | p. 128 |
Using truncated SVD for SMS message semantic analysis | p. 130 |
How well does LSA work for spam classification? | p. 131 |
4.5 Latent Dirichlet allocation (LDiA) | p. 134 |
The LDiA idea | p. 135 |
LDiA topic model for SMS messages | p. 137 |
LDiA + LDA = spam classifier | p. 140 |
A fairer comparison: 32 LDiA topics | p. 142 |
4.6 Distance and similarity | p. 143 |
4.7 Steering with feedback | p. 146 |
Linear discriminant analysis | p. 147 |
4.8 Topic vector power | p. 148 |
Semantic search | p. 150 |
Improvements | p. 152 |
Part 2 Deeper Learning (Neural Networks) | p. 153 |
5 Baby steps with neural networks (perceptrons and backpropagation) | p. 155 |
5.1 Neural networks, the ingredient list | p. 156 |
Perceptron | p. 157 |
A numerical perceptron | p. 157 |
Detour through bias | p. 158 |
Let's go skiing-the error surface | p. 172 |
Off the chair lift, onto the slope | p. 173 |
Let's shake things up a bit | p. 174 |
Keras: neural networks in Python | p. 175 |
Onward and deepward | p. 179 |
Normalization: input, with style | p. 179 |
6 Reasoning with word vectors (Word2vec) | p. 181 |
6.1 Semantic queries and analogies | p. 182 |
Analogy questions | p. 183 |
6.2 Word vectors | p. 184 |
Vector-oriented reasoning | p. 187 |
How to compute Word2vec representations | p. 191 |
How to use thegensim.word2vec module | p. 200 |
How to generate your own word vector representations | p. 202 |
Word2vec vs. GloVe (Global Vectors) | p. 205 |
FastText | p. 205 |
Word2vec vs. LSA | p. 206 |
Visualizing word relationships | p. 207 |
Unnatural words | p. 214 |
Document similarity with Doc2vec | p. 215 |
7 Getting words in order with convolutional neural networks (CNNs) | p. 218 |
7.1 Learning meaning | p. 220 |
7.2 Toolkit | p. 221 |
7.3 Convolutional neural nets | p. 222 |
Building blocks | p. 223 |
Step size (stride) | p. 224 |
Filter composition | p. 224 |
Padding | p. 226 |
Learning | p. 228 |
7.4 Narrow windows indeed | p. 228 |
Implementation in Keras: prepping the data | p. 230 |
Convolutional neural network architecture | p. 235 |
Pooling | p. 236 |
Dropout | p. 238 |
The cherry on the sundae | p. 239 |
Let's get to learning (training) | p. 241 |
Using the model in a pipeline | p. 243 |
Where do you go from here? | p. 244 |
8 Loopy (recurrent) neural networks (RNNs) | p. 247 |
8.1 Remembering with recurrent networks | p. 250 |
Backpropagation through time | p. 255 |
When do we update what? | p. 257 |
Recap | p. 259 |
There's always a catch | p. 259 |
Recurrent neural net with Keras | p. 260 |
8.2 Putting things together | p. 264 |
8.3 Let's get to learning our past selves | p. 266 |
8.4 Hyperparameters | p. 267 |
8.5 Predicting | p. 269 |
Statefulness | p. 270 |
Two-way street | p. 271 |
What is this thing? | p. 272 |
9 Improving retention with long short-term memory networks | p. 274 |
9.1 LSTM | p. 275 |
Backpropagation through time | p. 284 |
Where does the rubber hit the road? | p. 287 |
Dirty data | p. 288 |
Back to the dirty data | p. 291 |
Words are hard Letters are easier | p. 292 |
My turn to chat | p. 298 |
My turn to speak more clearly | p. 300 |
Learned how to say, but not yet what | p. 308 |
Other kinds of memory | p. 308 |
Going deeper | p. 309 |
10 Sequence-to-sequence models and attention | p. 311 |
10.1 Encoder-decoder architecture | p. 312 |
Decoding thought | p. 313 |
Look familiar? | p. 315 |
Sequence-to-sequence conversation | p. 316 |
LSTM review | p. 317 |
10.2 Assembling a sequence-to-sequence pipeline | p. 318 |
Preparing your dataset for the sequence-to-sequence training | p. 318 |
Sequence-to-sequence model in Keras | p. 320 |
Sequence encoder | p. 320 |
Thought decoder | p. 322 |
Assembling the sequence-to-sequence network | p. 323 |
10.3 Training the sequence-Co-sequence network | p. 324 |
Generate output sequences | p. 325 |
10.4 Building a chatbot using sequence-to-sequence network? | p. 326 |
Preparing the corpus for your training | p. 326 |
Building your character dictionary | p. 327 |
Generate one-hot encoded training sets | p. 328 |
Train your sequence-to-sequence chatbot | p. 329 |
Assemble the model for sequence generation | p. 330 |
Predicting a sequence | p. 330 |
Generating a response | p. 331 |
Converse with your chatbot | p. 331 |
10.5 Enhancements | p. 332 |
Reduce training complexity with bucketing | p. 332 |
Paying attention | p. 333 |
10.6 In the real world | p. 334 |
Part 3 Getting Real (Real-World NLP Chalenges) | p. 337 |
11 Information extraction (named entity extraction and question answering) | p. 339 |
11.1 Named entities and relations | p. 339 |
A knowledge base | p. 340 |
Information extraction | p. 343 |
11.2 Regular patterns | p. 343 |
Regular expressions | p. 344 |
Information extraction as NIL feature extraction | p. 345 |
11.3 Information worth extracting | p. 346 |
Extracting GPS locations | p. 347 |
Extracting dates | p. 347 |
11.4 Extracting relationships (relations) | p. 352 |
Part-of-speech (POS) tagging | p. 353 |
Entity name normalization | p. 357 |
Relation normalization and extraction | p. 358 |
Word patterns | p. 358 |
Segmentation | p. 359 |
Why won't split('.!?') work? | p. 360 |
Sentence segmentation with regular expressions | p. 361 |
11.5 In the real world | p. 363 |
Getting chatty (dialog engines) | p. 365 |
12.1 Language skill | p. 366 |
Modern approaches | p. 367 |
A hybrid approach | p. 373 |
12.2 Pattern-matching approach | p. 373 |
A pattern-matching chatbot with AIML | p. 375 |
A net-work view of pattern matching | p. 381 |
12.3 Grounding | p. 382 |
12.4 Retrieval (search) | p. 384 |
The context challenge | p. 384 |
Example retrieval-based chatbot | p. 386 |
A search-based chatbot | p. 389 |
12.5 Generative models | p. 391 |
Chat about NLPIA | p. 392 |
Pros and cons of each approach | p. 394 |
12.6 Four-wheel drive | p. 395 |
The Will to succeed | p. 395 |
12.7 Design process | p. 396 |
12.8 Trickery | p. 399 |
Ask questions with predictable answers | p. 399 |
Be entertaining | p. 399 |
When all else fails, search | p. 400 |
Being popular | p. 400 |
Be a connector | p. 400 |
Getting emotional | p. 400 |
12.9 In the real world | p. 401 |
13 Scaling up (optimization, parallelization, and batch processing) | p. 403 |
13.1 Too much of a good thing (data) | p. 404 |
13.2 Optimizing NLP algorithms | p. 404 |
Indexing | p. 405 |
Advanced, indexing | p. 406 |
Advanced indexing with Annoy | p. 408 |
Why use approximate indexes at all? | p. 412 |
An indexing workaround: discrelizing | p. 413 |
13.3 Constant RAM algorithms | p. 414 |
Gensim | p. 414 |
Graph computing | p. 415 |
13.4 Parallelizing your NLP computations | p. 416 |
Training NLP models on GPUs | p. 416 |
Renting vs. buying | p. 417 |
GPU rental options | p. 418 |
Tensor processing units | p. 419 |
13.5 Reducing the memory footprint during model training | p. 419 |
13.6 Gaining model insights with TensorBoard | p. 422 |
How to visualize word embeddings | p. 423 |
Appendix A Your NLP tools | p. 427 |
Appendix B Playful Python and regular expressions | p. 434 |
Appendix C Vectors and matrices (linear algebra fundamentals) | p. 440 |
Appendix D Machine learning tools and techniques | p. 446 |
Appendix E Setting up your AWS GPU | p. 459 |
Appendix F Locality sensitive hashing | p. 473 |
Resources | p. 481 |
Glossary | p. 490 |
Index | p. 497 |