Cover -- Copyright -- Table of Contents -- Preface -- Conventions Used in This Book -- Using Code Examples -- O’Reilly Safari -- How to Contact Us -- Acknowledments -- Chapter 1. Introduction -- The Supervised Learning Paradigm -- Observation and Target Encoding -- One-Hot Representation -- TF Representation -- TF-IDF Representation -- Target Encoding -- Computational Graphs -- PyTorch Basics -- Installing PyTorch -- Creating Tensors -- Tensor Types and Size -- Tensor Operations -- Indexing, Slicing, and Joining -- Tensors and Computational Graphs -- CUDA Tensors -- Exercises -- Solutions -- Summary -- References -- Chapter 2. A Quick Tour of Traditional NLP -- Corpora, Tokens, and Types -- Unigrams, Bigrams, Trigrams, …, N-grams -- Lemmas and Stems -- Categorizing Sentences and Documents -- Categorizing Words: POS Tagging -- Categorizing Spans: Chunking and Named Entity Recognition -- Structure of Sentences -- Word Senses and Semantics -- Summary -- References -- Chapter 3. Foundational Components of Neural Networks -- The Perceptron: The Simplest Neural Network -- Activation Functions -- Sigmoid -- Tanh -- ReLU -- Softmax -- Loss Functions -- Mean Squared Error Loss -- Categorical Cross-Entropy Loss -- Binary Cross-Entropy Loss -- Diving Deep into Supervised Training -- Constructing Toy Data -- Putting It Together: Gradient-Based Supervised Learning -- Auxiliary Training Concepts -- Correctly Measuring Model Performance: Evaluation Metrics -- Correctly Measuring Model Performance: Splitting the Dataset -- Knowing When to Stop Training -- Finding the Right Hyperparameters -- Regularization -- Example: Classifying Sentiment of Restaurant Reviews -- The Yelp Review Dataset -- Understanding PyTorch’s Dataset Representation -- The Vocabulary, the Vectorizer, and the DataLoader -- A Perceptron Classifier -- The Training Routine -- Evaluation, Inference, and Inspection -- Summary -- References -- Chapter 4. Feed-Forward Networks for Natural Language Processing -- The Multilayer Perceptron -- A Simple Example: XOR -- Implementing MLPs in PyTorch -- Example: Surname Classification with an MLP -- The Surnames Dataset -- Vocabulary, Vectorizer, and DataLoader -- The SurnameClassifier Model -- The Training Routine -- Model Evaluation and Prediction -- Regularizing MLPs: Weight Regularization and Structural Regularization (or Dropout) -- Convolutional Neural Networks -- CNN Hyperparameters -- Implementing CNNs in PyTorch -- Example: Classifying Surnames by Using a CNN -- The SurnameDataset Class -- Vocabulary, Vectorizer, and DataLoader -- Reimplementing the SurnameClassifier with Convolutional Networks -- The Training Routine -- Model Evaluation and Prediction -- Miscellaneous Topics in CNNs -- Pooling -- Batch Normalization (BatchNorm) -- Network-in-Network Connections (1x1 Convolutions) -- Residual Connections/Residual Block -- Summary -- References -- Chapter 5. Embedding Words and Types -- Why Learn Embeddings? -- Efficiency of Embeddings -- Approaches to Learning Word Embeddings -- The Practical Use of Pretrained Word Embeddings -- Example: Learning the Continuous Bag of Words Embeddings -- The Frankenstein Dataset -- Vocabulary, Vectorizer, and DataLoader -- The CBOWClassifier Model -- The Training Routine -- Model Evaluation and Prediction -- Example: Transfer Learning Using Pretrained Embeddings for Document Classification -- The AG News Dataset -- Vocabulary, Vectorizer, and DataLoader -- The NewsClassifier Model -- The Training Routine -- Model Evaluation and Prediction -- -- Summary -- References -- Chapter 6. Sequence Modeling for Natural Language Processing -- Introduction to Recurrent Neural Networks -- Implementing an Elman RNN -- Example: Classifying Surname Nationality Using a Character RNN -- The SurnameDataset Class -- The Vectorization Data Structures -- The SurnameClassifier Model -- The Training Routine and Results -- Summary -- References -- Chapter 7. Intermediate Sequence Modeling for Natural Language Processing -- The Problem with Vanilla RNNs (or Elman RNNs) -- Gating as a Solution to a Vanilla RNN’s Challenges -- Example: A Character RNN for Generating Surnames -- The SurnameDataset Class -- The Vectorization Data Structures -- From the ElmanRNN to the GRU -- Model 1: The Unconditioned SurnameGenerationModel -- Model 2: The Conditioned SurnameGenerationModel -- The Training Routine and Results -- Tips and Tricks for Training Sequence Models -- References -- Chapter 8. Advanced Sequence Modeling for Natural Language Processing -- Sequence-to-Sequence Models, Encoder–Decoder Models, and Conditioned Generation -- Capturing More from a Sequence: Bidirectional Recurrent Models -- Capturing More from a Sequence: Attention -- Attention in Deep Neural Networks -- Evaluating Sequence Generation Models -- Example: Neural Machine Translation -- The Machine Translation Dataset -- A Vectorization Pipeline for NMT -- Encoding and Decoding in the NMT Model -- The Training Routine and Results -- Summary -- References -- Chapter 9. Classics, Frontiers, and Next Steps -- What Have We Learned so Far? -- Timeless Topics in NLP -- Dialogue and Interactive Systems -- Discourse -- Information Extraction and Text Mining -- Document Analysis and Retrieval -- Frontiers in NLP -- Design Patterns for Production NLP Systems -- Where Next? -- References -- Index -- About the Authors -- Colophon -- .