Dive into Deep Learning
Table Of Contents
Dive into Deep Learning
Table Of Contents

4.3. Concise Implementation of Multilayer Perceptron

Now that we learned how multilayer perceptrons (MLPs) work in theory, let’s implement them. We begin, as always, by importing modules.

import sys
sys.path.insert(0, '..')

import d2l
from mxnet import gluon, init
from mxnet.gluon import loss as gloss, nn

4.3.1. The Model

The only difference from our softmax regression implementation is that we add two Dense (fully-connected) layers instead of one. The first is our hidden layer, which has 256 hidden units and uses the ReLU activation function.

net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'))
net.add(nn.Dense(10))
net.initialize(init.Normal(sigma=0.01))

Note that as above we can invoke net.add() multiple times in succession, but we can also invoke it a single time, passing in multiple layers to be added the network. Thus, we could have equivalently written net.add(nn.Dense(256, activation='relu'), nn.Dense(10)). Again, note that as always, Gluon automatically infers the missing input dimensions to each layer.

Training the model follows the exact same steps as in our softmax regression implementation.

batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

loss = gloss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.5})
num_epochs = 10
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None,
              None, trainer)
epoch 1, loss 0.7941, train acc 0.700, test acc 0.808
epoch 2, loss 0.4878, train acc 0.818, test acc 0.842
epoch 3, loss 0.4250, train acc 0.843, test acc 0.861
epoch 4, loss 0.3968, train acc 0.853, test acc 0.868
epoch 5, loss 0.3699, train acc 0.863, test acc 0.872
epoch 6, loss 0.3561, train acc 0.869, test acc 0.870
epoch 7, loss 0.3412, train acc 0.874, test acc 0.876
epoch 8, loss 0.3284, train acc 0.879, test acc 0.877
epoch 9, loss 0.3153, train acc 0.882, test acc 0.882
epoch 10, loss 0.3090, train acc 0.886, test acc 0.879

4.3.2. Exercises

  1. Try adding a few more hidden layers to see how the result changes.

  2. Try out different activation functions. Which ones work best?

  3. Try out different initializations of the weights.

4.3.3. Scan the QR Code to Discuss

image0