.. _sec_linear_gluon:
Concise Implementation of Linear Regression
===========================================
Broad and intense interest in deep learning for the past several years
has inspired both companies, academics, and hobbyists to develop a
variety of mature open source frameworks for automating the repetitive
work of implementing gradient-based learning algorithms. In the previous
section, we relied only on (i) ``ndarray`` for data storage and linear
algebra; and (ii) ``autograd`` for calculating derivatives. In practice,
because data iterators, loss functions, optimizers, and neural network
layers (and some whole architectures) are so common, modern libraries
implement these components for us as well.
In this section, we will show you how to implement the linear regression
model from :numref:`sec_linear_scratch` concisely by using Gluon.
Generating the Dataset
----------------------
To start, we will generate the same dataset as in the previous section.
.. code:: python
import d2l
from mxnet import autograd, gluon, np, npx
npx.set_np()
true_w = np.array([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)
Reading the Dataset
-------------------
Rather than rolling our own iterator, we can call upon Gluon’s ``data``
module to read data. The first step will be to instantiate an
``ArrayDataset``. This object’s constructor takes one or more
``ndarray``\ s as arguments. Here, we pass in ``features`` and
``labels`` as arguments. Next, we will use the ``ArrayDataset`` to
instantiate a ``DataLoader``, which also requires that we specify a
``batch_size`` and specify a Boolean value ``shuffle`` indicating
whether or not we want the ``DataLoader`` to shuffle the data on each
epoch (pass through the dataset).
.. code:: python
# Saved in the d2l package for later use
def load_array(data_arrays, batch_size, is_train=True):
"""Construct a Gluon data loader"""
dataset = gluon.data.ArrayDataset(*data_arrays)
return gluon.data.DataLoader(dataset, batch_size, shuffle=is_train)
batch_size = 10
data_iter = load_array((features, labels), batch_size)
Now we can use ``data_iter`` in much the same way as we called the
``data_iter`` function in the previous section. To verify that it is
working, we can read and print the first minibatch of instances.
.. code:: python
for X, y in data_iter:
print(X, '\n', y)
break
.. parsed-literal::
:class: output
[[-0.7903738 -1.883068 ]
[ 0.46760127 -0.16282491]
[-0.47508195 -0.24207895]
[ 1.6323917 -0.96297354]
[ 0.2444218 -0.68106437]
[-0.36137256 0.98650014]
[ 0.5429998 -0.31464806]
[ 0.35655284 -0.72057074]
[ 0.1221676 -0.00258584]
[ 1.6356094 0.14286116]]
[ 9.012126 5.6876993 4.045882 10.743488 7.012023 0.11717813
6.3510137 7.3593645 4.454032 6.979927 ]
Defining the Model
------------------
When we implemented linear regression from scratch (in
:numref:`sec_linear_scratch`), we defined our model parameters
explicitly and coded up the calculations to produce output using basic
linear algebra operations. You *should* know how to do this. But once
your models get more complex, and once you have to do this nearly every
day, you will be glad for the assistance. The situation is similar to
coding up your own blog from scratch. Doing it once or twice is
rewarding and instructive, but you would be a lousy web developer if
every time you needed a blog you spent a month reinventing the wheel.
For standard operations, we can use Gluon’s predefined layers, which
allow us to focus especially on the layers used to construct the model
rather than having to focus on the implementation. To define a linear
model, we first import the ``nn`` module, which defines a large number
of neural network layers (note that “nn” is an abbreviation for neural
networks). We will first define a model variable ``net``, which will
refer to an instance of the ``Sequential`` class. In Gluon,
``Sequential`` defines a container for several layers that will be
chained together. Given input data, a ``Sequential`` passes it through
the first layer, in turn passing the output as the second layer’s input
and so forth. In the following example, our model consists of only one
layer, so we do not really need ``Sequential``. But since nearly all of
our future models will involve multiple layers, we will use it anyway
just to familiarize you with the most standard workflow.
.. code:: python
from mxnet.gluon import nn
net = nn.Sequential()
Recall the architecture of a single-layer network as shown in
:numref:`fig_singleneuron`. The layer is said to be *fully-connected*
because each of its inputs are connected to each of its outputs by means
of a matrix-vector multiplication. In Gluon, the fully-connected layer
is defined in the ``Dense`` class. Since we only want to generate a
single scalar output, we set that number to :math:`1`.
.. _fig_singleneuron:
.. figure:: ../img/singleneuron.svg
Linear regression is a single-layer neural network.
.. code:: python
net.add(nn.Dense(1))
It is worth noting that, for convenience, Gluon does not require us to
specify the input shape for each layer. So here, we do not need to tell
Gluon how many inputs go into this linear layer. When we first try to
pass data through our model, e.g., when we execute ``net(X)`` later,
Gluon will automatically infer the number of inputs to each layer. We
will describe how this works in more detail in the chapter “Deep
Learning Computation”.
Initializing Model Parameters
-----------------------------
Before using ``net``, we need to initialize the model parameters, such
as the weights and biases in the linear regression model. We will import
the ``initializer`` module from MXNet. This module provides various
methods for model parameter initialization. Gluon makes ``init``
available as a shortcut (abbreviation) to access the ``initializer``
package. By calling ``init.Normal(sigma=0.01)``, we specify that each
*weight* parameter should be randomly sampled from a normal distribution
with mean :math:`0` and standard deviation :math:`0.01`. The *bias*
parameter will be initialized to zero by default. Both the weight vector
and bias will have attached gradients.
.. code:: python
from mxnet import init
net.initialize(init.Normal(sigma=0.01))
The code above may look straightforward but you should note that
something strange is happening here. We are initializing parameters for
a network even though Gluon does not yet know how many dimensions the
input will have! It might be :math:`2` as in our example or it might be
:math:`2000`. Gluon lets us get away with this because behind the
scenes, the initialization is actually *deferred*. The real
initialization will take place only when we for the first time attempt
to pass data through the network. Just be careful to remember that since
the parameters have not been initialized yet, we cannot access or
manipulate them.
Defining the Loss Function
--------------------------
In Gluon, the ``loss`` module defines various loss functions. We will
use the imported module ``loss`` with the pseudonym ``gloss``, to avoid
confusing it for the variable holding our chosen loss function. In this
example, we will use the Gluon implementation of squared loss
(``L2Loss``).
.. code:: python
from mxnet.gluon import loss as gloss
loss = gloss.L2Loss() # The squared loss is also known as the L2 norm loss
Defining the Optimization Algorithm
-----------------------------------
Minibatch SGD and related variants are standard tools for optimizing
neural networks and thus Gluon supports SGD alongside a number of
variations on this algorithm through its ``Trainer`` class. When we
instantiate the ``Trainer``, we will specify the parameters to optimize
over (obtainable from our net via ``net.collect_params()``), the
optimization algorithm we wish to use (``sgd``), and a dictionary of
hyper-parameters required by our optimization algorithm. SGD just
requires that we set the value ``learning_rate``, (here we set it to
0.03).
.. code:: python
from mxnet import gluon
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.03})
Training
--------
You might have noticed that expressing our model through Gluon requires
comparatively few lines of code. We did not have to individually
allocate parameters, define our loss function, or implement stochastic
gradient descent. Once we start working with much more complex models,
Gluon’s advantages will grow considerably. However, once we have all the
basic pieces in place, the training loop itself is strikingly similar to
what we did when implementing everything from scratch.
To refresh your memory: for some number of epochs, we will make a
complete pass over the dataset (train_data), iteratively grabbing one
minibatch of inputs and the corresponding ground-truth labels. For each
minibatch, we go through the following ritual:
- Generate predictions by calling ``net(X)`` and calculate the loss
``l`` (the forward pass).
- Calculate gradients by calling ``l.backward()`` (the backward pass).
- Update the model parameters by invoking our SGD optimizer (note that
``trainer`` already knows which parameters to optimize over, so we
just need to pass in the minibatch size.
For good measure, we compute the loss after each epoch and print it to
monitor progress.
.. code:: python
num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
with autograd.record():
l = loss(net(X), y)
l.backward()
trainer.step(batch_size)
l = loss(net(features), labels)
print('epoch %d, loss: %f' % (epoch, l.mean().asnumpy()))
.. parsed-literal::
:class: output
epoch 1, loss: 0.024881
epoch 2, loss: 0.000090
epoch 3, loss: 0.000051
Below, we compare the model parameters learned by training on finite
data and the actual parameters that generated our dataset. To access
parameters with Gluon, we first access the layer that we need from
``net`` and then access that layer’s weight (``weight``) and bias
(``bias``). To access each parameter’s values as an ``ndarray``, we
invoke its ``data`` method. As in our from-scratch implementation, note
that our estimated parameters are close to their ground truth
counterparts.
.. code:: python
w = net[0].weight.data()
print('Error in estimating w', true_w.reshape(w.shape) - w)
b = net[0].bias.data()
print('Error in estimating b', true_b - b)
.. parsed-literal::
:class: output
Error in estimating w [[ 0.00056791 -0.0002799 ]]
Error in estimating b [0.00054121]
Summary
-------
- Using Gluon, we can implement models much more succinctly.
- In Gluon, the ``data`` module provides tools for data processing, the
``nn`` module defines a large number of neural network layers, and
the ``loss`` module defines many common loss functions.
- MXNet’s module ``initializer`` provides various methods for model
parameter initialization.
- Dimensionality and storage are automatically inferred (but be careful
not to attempt to access parameters before they have been
initialized).
Exercises
---------
1. If we replace ``l = loss(output, y)`` with
``l = loss(output, y).mean()``, we need to change
``trainer.step(batch_size)`` to ``trainer.step(1)`` for the code to
behave identically. Why?
2. Review the MXNet documentation to see what loss functions and
initialization methods are provided in the modules ``gluon.loss`` and
``init``. Replace the loss by Huber’s loss.
3. How do you access the gradient of ``dense.weight``?
`Discussions `__
-------------------------------------------------
|image0|
.. |image0| image:: ../img/qr_linear-regression-gluon.svg