4.3. Concise Implementation of Multilayer Perceptron

Now that we learned how multilayer perceptrons (MLPs) work in theory, let’s implement them. We begin, as always, by importing modules.

import d2l
from mxnet import gluon, init, npx
from mxnet.gluon import nn
npx.set_np()

4.3.1. The Model

The only difference from our softmax regression implementation is that we add two Dense (fully-connected) layers instead of one. The first is our hidden layer, which has 256 hidden units and uses the ReLU activation function.

net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'),
        nn.Dense(10))
net.initialize(init.Normal(sigma=0.01))

Again, note that as always, Gluon automatically infers the missing input dimensions to each layer.

Training the model follows the exact same steps as in our softmax regression implementation.

batch_size, num_epochs = 256, 10
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
loss = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.5})
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
../_images/output_mlp-gluon_bc1c14_5_0.svg

4.3.2. Exercises

  1. Try adding a few more hidden layers to see how the result changes.

  2. Try out different activation functions. Which ones work best?

  3. Try out different initializations of the weights.

4.3.3. Discussions

image0