23. Appendix: Tools for Deep Learning
Quick search
code
Show Source
Preview Version PyTorch MXNet Notebooks Courses GitHub 中文版
Dive into Deep Learning
Table Of Contents
  • Preface
  • Installation
  • Notation
  • 1. Introduction
  • 2. Preliminaries
    • 2.1. Data Manipulation
    • 2.2. Data Preprocessing
    • 2.3. Linear Algebra
    • 2.4. Calculus
    • 2.5. Automatic Differentiation
    • 2.6. Probability and Statistics
    • 2.7. Documentation
  • 3. Linear Neural Networks for Regression
    • 3.1. Linear Regression
    • 3.2. Object-Oriented Design for Implementation
    • 3.3. Synthetic Regression Data
    • 3.4. Linear Regression Implementation from Scratch
    • 3.5. Concise Implementation of Linear Regression
    • 3.6. Generalization
    • 3.7. Weight Decay
  • 4. Linear Neural Networks for Classification
    • 4.1. Softmax Regression
    • 4.2. The Image Classification Dataset
    • 4.3. The Base Classification Model
    • 4.4. Softmax Regression Implementation from Scratch
    • 4.5. Concise Implementation of Softmax Regression
    • 4.6. Generalization in Classification
    • 4.7. Environment and Distribution Shift
  • 5. Multilayer Perceptrons
    • 5.1. Multilayer Perceptrons
    • 5.2. Implementation of Multilayer Perceptrons
    • 5.3. Forward Propagation, Backward Propagation, and Computational Graphs
    • 5.4. Numerical Stability and Initialization
    • 5.5. Generalization in Deep Learning
    • 5.6. Dropout
    • 5.7. Predicting House Prices on Kaggle
  • 6. Builders’ Guide
    • 6.1. Layers and Modules
    • 6.2. Parameter Management
    • 6.3. Parameter Initialization
    • 6.4. Lazy Initialization
    • 6.5. Custom Layers
    • 6.6. File I/O
    • 6.7. GPUs
  • 7. Convolutional Neural Networks
    • 7.1. From Fully Connected Layers to Convolutions
    • 7.2. Convolutions for Images
    • 7.3. Padding and Stride
    • 7.4. Multiple Input and Multiple Output Channels
    • 7.5. Pooling
    • 7.6. Convolutional Neural Networks (LeNet)
  • 8. Modern Convolutional Neural Networks
    • 8.1. Deep Convolutional Neural Networks (AlexNet)
    • 8.2. Networks Using Blocks (VGG)
    • 8.3. Network in Network (NiN)
    • 8.4. Multi-Branch Networks (GoogLeNet)
    • 8.5. Batch Normalization
    • 8.6. Residual Networks (ResNet) and ResNeXt
    • 8.7. Densely Connected Networks (DenseNet)
    • 8.8. Designing Convolution Network Architectures
  • 9. Recurrent Neural Networks
    • 9.1. Working with Sequences
    • 9.2. Converting Raw Text into Sequence Data
    • 9.3. Language Models
    • 9.4. Recurrent Neural Networks
    • 9.5. Recurrent Neural Network Implementation from Scratch
    • 9.6. Concise Implementation of Recurrent Neural Networks
    • 9.7. Backpropagation Through Time
  • 10. Modern Recurrent Neural Networks
    • 10.1. Long Short-Term Memory (LSTM)
    • 10.2. Gated Recurrent Units (GRU)
    • 10.3. Deep Recurrent Neural Networks
    • 10.4. Bidirectional Recurrent Neural Networks
    • 10.5. Machine Translation and the Dataset
    • 10.6. The Encoder-Decoder Architecture
    • 10.7. Encoder-Decoder Seq2Seq for Machine Translation
    • 10.8. Beam Search
  • 11. Attention Mechanisms and Transformers
    • 11.1. Queries, Keys, and Values
    • 11.2. Attention Pooling by Similarity
    • 11.3. Attention Scoring Functions
    • 11.4. The Bahdanau Attention Mechanism
    • 11.5. Multi-Head Attention
    • 11.6. Self-Attention and Positional Encoding
    • 11.7. The Transformer Architecture
    • 11.8. Transformers for Vision
    • 11.9. Large-Scale Pretraining with Transformers
  • 12. Optimization Algorithms
    • 12.1. Optimization and Deep Learning
    • 12.2. Convexity
    • 12.3. Gradient Descent
    • 12.4. Stochastic Gradient Descent
    • 12.5. Minibatch Stochastic Gradient Descent
    • 12.6. Momentum
    • 12.7. Adagrad
    • 12.8. RMSProp
    • 12.9. Adadelta
    • 12.10. Adam
    • 12.11. Learning Rate Scheduling
  • 13. Computational Performance
    • 13.1. Compilers and Interpreters
    • 13.2. Asynchronous Computation
    • 13.3. Automatic Parallelism
    • 13.4. Hardware
    • 13.5. Training on Multiple GPUs
    • 13.6. Concise Implementation for Multiple GPUs
    • 13.7. Parameter Servers
  • 14. Computer Vision
    • 14.1. Image Augmentation
    • 14.2. Fine-Tuning
    • 14.3. Object Detection and Bounding Boxes
    • 14.4. Anchor Boxes
    • 14.5. Multiscale Object Detection
    • 14.6. The Object Detection Dataset
    • 14.7. Single Shot Multibox Detection
    • 14.8. Region-based CNNs (R-CNNs)
    • 14.9. Semantic Segmentation and the Dataset
    • 14.10. Transposed Convolution
    • 14.11. Fully Convolutional Networks
    • 14.12. Neural Style Transfer
    • 14.13. Image Classification (CIFAR-10) on Kaggle
    • 14.14. Dog Breed Identification (ImageNet Dogs) on Kaggle
  • 15. Natural Language Processing: Pretraining
    • 15.1. Word Embedding (word2vec)
    • 15.2. Approximate Training
    • 15.3. The Dataset for Pretraining Word Embeddings
    • 15.4. Pretraining word2vec
    • 15.5. Word Embedding with Global Vectors (GloVe)
    • 15.6. Subword Embedding
    • 15.7. Word Similarity and Analogy
    • 15.8. Bidirectional Encoder Representations from Transformers (BERT)
    • 15.9. The Dataset for Pretraining BERT
    • 15.10. Pretraining BERT
  • 16. Natural Language Processing: Applications
    • 16.1. Sentiment Analysis and the Dataset
    • 16.2. Sentiment Analysis: Using Recurrent Neural Networks
    • 16.3. Sentiment Analysis: Using Convolutional Neural Networks
    • 16.4. Natural Language Inference and the Dataset
    • 16.5. Natural Language Inference: Using Attention
    • 16.6. Fine-Tuning BERT for Sequence-Level and Token-Level Applications
    • 16.7. Natural Language Inference: Fine-Tuning BERT
  • 17. Reinforcement Learning
    • 17.1. Markov Decision Process (MDP)
    • 17.2. Value Iteration
    • 17.3. Q-Learning
  • 18. Gaussian Processes
    • 18.1. Introduction to Gaussian Processes
    • 18.2. Gaussian Process Priors
    • 18.3. Gaussian Process Inference
  • 19. Hyperparameter Optimization
    • 19.1. What Is Hyperparameter Optimization?
    • 19.2. Hyperparameter Optimization API
    • 19.3. Asynchronous Random Search
    • 19.4. Multi-Fidelity Hyperparameter Optimization
    • 19.5. Asynchronous Successive Halving
  • 20. Generative Adversarial Networks
    • 20.1. Generative Adversarial Networks
    • 20.2. Deep Convolutional Generative Adversarial Networks
  • 21. Recommender Systems
    • 21.1. Overview of Recommender Systems
    • 21.2. The MovieLens Dataset
    • 21.3. Matrix Factorization
    • 21.4. AutoRec: Rating Prediction with Autoencoders
    • 21.5. Personalized Ranking for Recommender Systems
    • 21.6. Neural Collaborative Filtering for Personalized Ranking
    • 21.7. Sequence-Aware Recommender Systems
    • 21.8. Feature-Rich Recommender Systems
    • 21.9. Factorization Machines
    • 21.10. Deep Factorization Machines
  • 22. Appendix: Mathematics for Deep Learning
    • 22.1. Geometry and Linear Algebraic Operations
    • 22.2. Eigendecompositions
    • 22.3. Single Variable Calculus
    • 22.4. Multivariable Calculus
    • 22.5. Integral Calculus
    • 22.6. Random Variables
    • 22.7. Maximum Likelihood
    • 22.8. Distributions
    • 22.9. Naive Bayes
    • 22.10. Statistics
    • 22.11. Information Theory
  • 23. Appendix: Tools for Deep Learning
    • 23.1. Using Jupyter Notebooks
    • 23.2. Using Amazon SageMaker
    • 23.3. Using AWS EC2 Instances
    • 23.4. Using Google Colab
    • 23.5. Selecting Servers and GPUs
    • 23.6. Contributing to This Book
    • 23.7. Utility Functions and Classes
    • 23.8. The d2l API Document
  • References
Dive into Deep Learning
Table Of Contents
  • Preface
  • Installation
  • Notation
  • 1. Introduction
  • 2. Preliminaries
    • 2.1. Data Manipulation
    • 2.2. Data Preprocessing
    • 2.3. Linear Algebra
    • 2.4. Calculus
    • 2.5. Automatic Differentiation
    • 2.6. Probability and Statistics
    • 2.7. Documentation
  • 3. Linear Neural Networks for Regression
    • 3.1. Linear Regression
    • 3.2. Object-Oriented Design for Implementation
    • 3.3. Synthetic Regression Data
    • 3.4. Linear Regression Implementation from Scratch
    • 3.5. Concise Implementation of Linear Regression
    • 3.6. Generalization
    • 3.7. Weight Decay
  • 4. Linear Neural Networks for Classification
    • 4.1. Softmax Regression
    • 4.2. The Image Classification Dataset
    • 4.3. The Base Classification Model
    • 4.4. Softmax Regression Implementation from Scratch
    • 4.5. Concise Implementation of Softmax Regression
    • 4.6. Generalization in Classification
    • 4.7. Environment and Distribution Shift
  • 5. Multilayer Perceptrons
    • 5.1. Multilayer Perceptrons
    • 5.2. Implementation of Multilayer Perceptrons
    • 5.3. Forward Propagation, Backward Propagation, and Computational Graphs
    • 5.4. Numerical Stability and Initialization
    • 5.5. Generalization in Deep Learning
    • 5.6. Dropout
    • 5.7. Predicting House Prices on Kaggle
  • 6. Builders’ Guide
    • 6.1. Layers and Modules
    • 6.2. Parameter Management
    • 6.3. Parameter Initialization
    • 6.4. Lazy Initialization
    • 6.5. Custom Layers
    • 6.6. File I/O
    • 6.7. GPUs
  • 7. Convolutional Neural Networks
    • 7.1. From Fully Connected Layers to Convolutions
    • 7.2. Convolutions for Images
    • 7.3. Padding and Stride
    • 7.4. Multiple Input and Multiple Output Channels
    • 7.5. Pooling
    • 7.6. Convolutional Neural Networks (LeNet)
  • 8. Modern Convolutional Neural Networks
    • 8.1. Deep Convolutional Neural Networks (AlexNet)
    • 8.2. Networks Using Blocks (VGG)
    • 8.3. Network in Network (NiN)
    • 8.4. Multi-Branch Networks (GoogLeNet)
    • 8.5. Batch Normalization
    • 8.6. Residual Networks (ResNet) and ResNeXt
    • 8.7. Densely Connected Networks (DenseNet)
    • 8.8. Designing Convolution Network Architectures
  • 9. Recurrent Neural Networks
    • 9.1. Working with Sequences
    • 9.2. Converting Raw Text into Sequence Data
    • 9.3. Language Models
    • 9.4. Recurrent Neural Networks
    • 9.5. Recurrent Neural Network Implementation from Scratch
    • 9.6. Concise Implementation of Recurrent Neural Networks
    • 9.7. Backpropagation Through Time
  • 10. Modern Recurrent Neural Networks
    • 10.1. Long Short-Term Memory (LSTM)
    • 10.2. Gated Recurrent Units (GRU)
    • 10.3. Deep Recurrent Neural Networks
    • 10.4. Bidirectional Recurrent Neural Networks
    • 10.5. Machine Translation and the Dataset
    • 10.6. The Encoder-Decoder Architecture
    • 10.7. Encoder-Decoder Seq2Seq for Machine Translation
    • 10.8. Beam Search
  • 11. Attention Mechanisms and Transformers
    • 11.1. Queries, Keys, and Values
    • 11.2. Attention Pooling by Similarity
    • 11.3. Attention Scoring Functions
    • 11.4. The Bahdanau Attention Mechanism
    • 11.5. Multi-Head Attention
    • 11.6. Self-Attention and Positional Encoding
    • 11.7. The Transformer Architecture
    • 11.8. Transformers for Vision
    • 11.9. Large-Scale Pretraining with Transformers
  • 12. Optimization Algorithms
    • 12.1. Optimization and Deep Learning
    • 12.2. Convexity
    • 12.3. Gradient Descent
    • 12.4. Stochastic Gradient Descent
    • 12.5. Minibatch Stochastic Gradient Descent
    • 12.6. Momentum
    • 12.7. Adagrad
    • 12.8. RMSProp
    • 12.9. Adadelta
    • 12.10. Adam
    • 12.11. Learning Rate Scheduling
  • 13. Computational Performance
    • 13.1. Compilers and Interpreters
    • 13.2. Asynchronous Computation
    • 13.3. Automatic Parallelism
    • 13.4. Hardware
    • 13.5. Training on Multiple GPUs
    • 13.6. Concise Implementation for Multiple GPUs
    • 13.7. Parameter Servers
  • 14. Computer Vision
    • 14.1. Image Augmentation
    • 14.2. Fine-Tuning
    • 14.3. Object Detection and Bounding Boxes
    • 14.4. Anchor Boxes
    • 14.5. Multiscale Object Detection
    • 14.6. The Object Detection Dataset
    • 14.7. Single Shot Multibox Detection
    • 14.8. Region-based CNNs (R-CNNs)
    • 14.9. Semantic Segmentation and the Dataset
    • 14.10. Transposed Convolution
    • 14.11. Fully Convolutional Networks
    • 14.12. Neural Style Transfer
    • 14.13. Image Classification (CIFAR-10) on Kaggle
    • 14.14. Dog Breed Identification (ImageNet Dogs) on Kaggle
  • 15. Natural Language Processing: Pretraining
    • 15.1. Word Embedding (word2vec)
    • 15.2. Approximate Training
    • 15.3. The Dataset for Pretraining Word Embeddings
    • 15.4. Pretraining word2vec
    • 15.5. Word Embedding with Global Vectors (GloVe)
    • 15.6. Subword Embedding
    • 15.7. Word Similarity and Analogy
    • 15.8. Bidirectional Encoder Representations from Transformers (BERT)
    • 15.9. The Dataset for Pretraining BERT
    • 15.10. Pretraining BERT
  • 16. Natural Language Processing: Applications
    • 16.1. Sentiment Analysis and the Dataset
    • 16.2. Sentiment Analysis: Using Recurrent Neural Networks
    • 16.3. Sentiment Analysis: Using Convolutional Neural Networks
    • 16.4. Natural Language Inference and the Dataset
    • 16.5. Natural Language Inference: Using Attention
    • 16.6. Fine-Tuning BERT for Sequence-Level and Token-Level Applications
    • 16.7. Natural Language Inference: Fine-Tuning BERT
  • 17. Reinforcement Learning
    • 17.1. Markov Decision Process (MDP)
    • 17.2. Value Iteration
    • 17.3. Q-Learning
  • 18. Gaussian Processes
    • 18.1. Introduction to Gaussian Processes
    • 18.2. Gaussian Process Priors
    • 18.3. Gaussian Process Inference
  • 19. Hyperparameter Optimization
    • 19.1. What Is Hyperparameter Optimization?
    • 19.2. Hyperparameter Optimization API
    • 19.3. Asynchronous Random Search
    • 19.4. Multi-Fidelity Hyperparameter Optimization
    • 19.5. Asynchronous Successive Halving
  • 20. Generative Adversarial Networks
    • 20.1. Generative Adversarial Networks
    • 20.2. Deep Convolutional Generative Adversarial Networks
  • 21. Recommender Systems
    • 21.1. Overview of Recommender Systems
    • 21.2. The MovieLens Dataset
    • 21.3. Matrix Factorization
    • 21.4. AutoRec: Rating Prediction with Autoencoders
    • 21.5. Personalized Ranking for Recommender Systems
    • 21.6. Neural Collaborative Filtering for Personalized Ranking
    • 21.7. Sequence-Aware Recommender Systems
    • 21.8. Feature-Rich Recommender Systems
    • 21.9. Factorization Machines
    • 21.10. Deep Factorization Machines
  • 22. Appendix: Mathematics for Deep Learning
    • 22.1. Geometry and Linear Algebraic Operations
    • 22.2. Eigendecompositions
    • 22.3. Single Variable Calculus
    • 22.4. Multivariable Calculus
    • 22.5. Integral Calculus
    • 22.6. Random Variables
    • 22.7. Maximum Likelihood
    • 22.8. Distributions
    • 22.9. Naive Bayes
    • 22.10. Statistics
    • 22.11. Information Theory
  • 23. Appendix: Tools for Deep Learning
    • 23.1. Using Jupyter Notebooks
    • 23.2. Using Amazon SageMaker
    • 23.3. Using AWS EC2 Instances
    • 23.4. Using Google Colab
    • 23.5. Selecting Servers and GPUs
    • 23.6. Contributing to This Book
    • 23.7. Utility Functions and Classes
    • 23.8. The d2l API Document
  • References

23. Appendix: Tools for Deep Learning¶

To get the most out of Dive into Deep Learning, we will talk you through different tools in this appendix, such as for running and contributing to this interactive open-source book.

  • 23.1. Using Jupyter Notebooks
    • 23.1.1. Editing and Running the Code Locally
    • 23.1.2. Advanced Options
    • 23.1.3. Summary
    • 23.1.4. Exercises
  • 23.2. Using Amazon SageMaker
    • 23.2.1. Signing Up
    • 23.2.2. Creating a SageMaker Instance
    • 23.2.3. Running and Stopping an Instance
    • 23.2.4. Updating Notebooks
    • 23.2.5. Summary
    • 23.2.6. Exercises
  • 23.3. Using AWS EC2 Instances
    • 23.3.1. Creating and Running an EC2 Instance
    • 23.3.2. Installing CUDA
    • 23.3.3. Installing Libraries for Running the Code
    • 23.3.4. Running the Jupyter Notebook remotely
    • 23.3.5. Closing Unused Instances
    • 23.3.6. Summary
    • 23.3.7. Exercises
  • 23.4. Using Google Colab
    • 23.4.1. Summary
    • 23.4.2. Exercises
  • 23.5. Selecting Servers and GPUs
    • 23.5.1. Selecting Servers
    • 23.5.2. Selecting GPUs
    • 23.5.3. Summary
  • 23.6. Contributing to This Book
    • 23.6.1. Submitting Minor Changes
    • 23.6.2. Proposing Major Changes
    • 23.6.3. Submitting Major Changes
    • 23.6.4. Summary
    • 23.6.5. Exercises
  • 23.7. Utility Functions and Classes
  • 23.8. The d2l API Document
    • 23.8.1. Classes
    • 23.8.2. Functions
Previous
22.11. Information Theory
Next
23.1. Using Jupyter Notebooks