Dive into Deep Learning
Table Of Contents
Dive into Deep Learning
Table Of Contents

3.5. Image Classification Data (Fashion-MNIST)

Before we implement softmax regression ourselves, let’s pick a real dataset to work with. To make things visually compelling, we will pick an image classification dataset. The most commonly used image classification data set is the MNIST handwritten digit recognition data set, proposed by LeCun, Cortes and Burges in the 1990s. However, even simple models achieve classification accuracy over 95% on MNIST, so it is hard to spot the differences between better models and weaker ones. In order to get a better intuition, we will use the qualitatively similar, but comparatively complex Fashion-MNIST dataset, proposed by Xiao, Rasul and Vollgraf in 2017.

3.5.1. Getting the Data

First, import the packages or modules required in this section.

import sys
sys.path.insert(0, '..')

%matplotlib inline
import d2l
from mxnet.gluon import data as gdata
import time

Conveniently, Gluon’s data package provides easy access to a number of benchmark datasets for testing our models. The first time we invoke data.vision.FashionMNIST(train=True) to collect the training data, Gluon will automatically retrieve the dataset via our Internet connection. Subsequently, Gluon will use the already-downloaded local copy. We specify whether we are requesting the training set or the test set by setting the value of the parameter train to True or False, respectively. Recall that we will only be using the training data for training, holding out the test set for a final evaluation of our model.

mnist_train = gdata.vision.FashionMNIST(train=True)
mnist_test = gdata.vision.FashionMNIST(train=False)

The number of images for each category in the training set and the testing set is 6,000 and 1,000, respectively. Since there are 10 categories, the numbers of examples in the training set and the test set are 60,000 and 10,000, respectively.

len(mnist_train), len(mnist_test)
(60000, 10000)

We can access any example by indexing into the dataset using square brackets []. In the following code, we access the image and label corresponding to the first example.

feature, label = mnist_train[0]

Our example, stored here in the variable feature corresponds to an image with a height and width of 28 pixels. Each pixel is an 8-bit unsigned integer (uint8) with values between 0 and 255. It is stored in a 3D NDArray. Its last dimension is the number of channels. Since the data set is a grayscale image, the number of channels is 1. When we encounter color, images, we’ll have 3 channels for red, green, and blue. To keep things simple, we will record the shape of the image with the height and width of \(h\) and \(w\) pixels, respectively, as \(h \times w\) or (h, w).

feature.shape, feature.dtype
((28, 28, 1), numpy.uint8)

The label of each image is represented as a scalar in NumPy. Its type is a 32-bit integer.

label, type(label), label.dtype
(2, numpy.int32, dtype('int32'))

There are 10 categories in Fashion-MNIST: t-shirt, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag and ankle boot. The following function can convert a numeric label into a corresponding text label.

# This function has been saved in the d2l package for future use
def get_fashion_mnist_labels(labels):
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

The following defines a function that can draw multiple images and corresponding labels in a single line.

# This function has been saved in the d2l package for future use
def show_fashion_mnist(images, labels):
    # Here _ means that we ignore (not use) variables
    _, figs = d2l.plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.reshape((28, 28)).asnumpy())

Next, let’s take a look at the image contents and text labels for the first nine examples in the training data set.

X, y = mnist_train[0:9]
show_fashion_mnist(X, get_fashion_mnist_labels(y))

3.5.2. Reading a Minibatch

To make our life easier when reading from the training and test sets we use a DataLoader rather than creating one from scratch, as we did in Section 3.2. Recall that a data loader reads a mini-batch of data with an example number of batch_size each time.

In practice, reading data can often be a significant performance bottleneck for training, especially when the model is simple or when the computer is fast. A handy feature of Gluon’s DataLoader is the ability to use multiple processes to speed up data reading (not currently supported on Windows). For instance, we can set aside 4 processes to read the data (via num_workers).

In addition, we convert the image data from uint8 to 32-bit floating point numbers using the ToTensor class. Beyond that we divide all numbers by 255 so that all pixels have values between 0 and 1. The ToTensor class also moves the image channel from the last dimension to the first dimension to facilitate the convolutional neural network calculations introduced later. Through the transform_first function of the data set, we apply the transformation of ToTensor to the first element of each data example (image and label), i.e., the image.

batch_size = 256
transformer = gdata.vision.transforms.ToTensor()
if sys.platform.startswith('win'):
    # 0 means no additional processes are needed to speed up the reading of
    # data
    num_workers = 0
    num_workers = 4

train_iter = gdata.DataLoader(mnist_train.transform_first(transformer),
                              batch_size, shuffle=True,
test_iter = gdata.DataLoader(mnist_test.transform_first(transformer),
                             batch_size, shuffle=False,

The logic that we will use to obtain and read the Fashion-MNIST data set is encapsulated in the d2l.load_data_fashion_mnist function, which we will use in later chapters. This function will return two variables, train_iter and test_iter. As the content of this book continues to deepen, we will further improve this function. Its full implementation will be described in Section 7.1.

Let’s look at the time it takes to read the training data.

start = time.time()
for X, y in train_iter:
'%.2f sec' % (time.time() - start)
'1.13 sec'

3.5.3. Summary

  • Fashion-MNIST is an apparel classification data set containing 10 categories, which we will use to test the performance of different algorithms in later chapters.

  • We store the shape of image using height and width of \(h\) and \(w\) pixels, respectively, as \(h \times w\) or (h, w).

  • Data iterators are a key component for efficient performance. Use existing ones if available.

3.5.4. Exercises

  1. Does reducing batch_size (for instance, to 1) affect read performance?

  2. For non-Windows users, try modifying num_workers to see how it affects read performance.

  3. Use the MXNet documentation to see which other datasets are available in mxnet.gluon.data.vision.

  4. Use the MXNet documentation to see which other transformations are available in mxnet.gluon.data.vision.transforms.

3.5.5. Scan the QR Code to Discuss