.. _sec_transposed_conv:
Transposed Convolution
======================
The layers we introduced so far for convolutional neural networks,
including convolutional layers (:numref:`sec_conv_layer`) and pooling
layers (:numref:`sec_pooling`), often reduce the input width and
height, or keep them unchanged. Applications such as semantic
segmentation (:numref:`sec_semantic_segmentation`) and generative
adversarial networks (:numref:`sec_dcgan`), however, require to
predict values for each pixel and therefore needs to increase input
width and height. Transposed convolution, also named
fractionally-strided convolution :cite:`Dumoulin.Visin.2016` or
deconvolution :cite:`Long.Shelhamer.Darrell.2015`, serves this
purpose.
.. code:: python
from mxnet import np, npx, init
from mxnet.gluon import nn
from d2l import mxnet as d2l
npx.set_np()
Basic 2D Transposed Convolution
-------------------------------
Let us consider a basic case that both input and output channels are 1,
with 0 padding and 1 stride. :numref:`fig_trans_conv` illustrates how
transposed convolution with a :math:`2\times 2` kernel is computed on
the :math:`2\times 2` input matrix.
.. _fig_trans_conv:
.. figure:: ../img/trans-conv.svg
Transposed convolution layer with a :math:`2\times 2` kernel.
We can implement this operation by giving matrix kernel :math:`K` and
matrix input :math:`X`.
.. code:: python
def trans_conv(X, K):
h, w = K.shape
Y = np.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1))
for i in range(X.shape[0]):
for j in range(X.shape[1]):
Y[i: i + h, j: j + w] += X[i, j] * K
return Y
Remember the convolution computes results by
``Y[i, j] = (X[i: i + h, j: j + w] * K).sum()`` (refer to ``corr2d`` in
:numref:`sec_conv_layer`), which summarizes input values through the
kernel. While the transposed convolution broadcasts input values through
the kernel, which results in a larger output shape.
Verify the results in :numref:`fig_trans_conv`.
.. code:: python
X = np.array([[0, 1], [2, 3]])
K = np.array([[0, 1], [2, 3]])
trans_conv(X, K)
.. parsed-literal::
:class: output
array([[ 0., 0., 1.],
[ 0., 4., 6.],
[ 4., 12., 9.]])
Or we can use ``nn.Conv2DTranspose`` to obtain the same results. As
``nn.Conv2D``, both input and kernel should be 4-D tensors.
.. code:: python
X, K = X.reshape(1, 1, 2, 2), K.reshape(1, 1, 2, 2)
tconv = nn.Conv2DTranspose(1, kernel_size=2)
tconv.initialize(init.Constant(K))
tconv(X)
.. parsed-literal::
:class: output
array([[[[ 0., 0., 1.],
[ 0., 4., 6.],
[ 4., 12., 9.]]]])
Padding, Strides, and Channels
------------------------------
We apply padding elements to the input in convolution, while they are
applied to the output in transposed convolution. A :math:`1\times 1`
padding means we first compute the output as normal, then remove the
first/last rows and columns.
.. code:: python
tconv = nn.Conv2DTranspose(1, kernel_size=2, padding=1)
tconv.initialize(init.Constant(K))
tconv(X)
.. parsed-literal::
:class: output
array([[[[4.]]]])
Similarly, strides are applied to outputs as well.
.. code:: python
tconv = nn.Conv2DTranspose(1, kernel_size=2, strides=2)
tconv.initialize(init.Constant(K))
tconv(X)
.. parsed-literal::
:class: output
array([[[[0., 0., 0., 1.],
[0., 0., 2., 3.],
[0., 2., 0., 3.],
[4., 6., 6., 9.]]]])
The multi-channel extension of the transposed convolution is the same as
the convolution. When the input has multiple channels, denoted by
:math:`c_i`, the transposed convolution assigns a :math:`k_h\times k_w`
kernel matrix to each input channel. If the output has a channel size
:math:`c_o`, then we have a :math:`c_i\times k_h\times k_w` kernel for
each output channel.
As a result, if we feed :math:`X` into a convolutional layer :math:`f`
to compute :math:`Y=f(X)` and create a transposed convolution layer
:math:`g` with the same hyperparameters as :math:`f` except for the
output channel set to be the channel size of :math:`X`, then
:math:`g(Y)` should has the same shape as :math:`X`. Let us verify this
statement.
.. code:: python
X = np.random.uniform(size=(1, 10, 16, 16))
conv = nn.Conv2D(20, kernel_size=5, padding=2, strides=3)
tconv = nn.Conv2DTranspose(10, kernel_size=5, padding=2, strides=3)
conv.initialize()
tconv.initialize()
tconv(conv(X)).shape == X.shape
.. parsed-literal::
:class: output
True
Analogy to Matrix Transposition
-------------------------------
The transposed convolution takes its name from the matrix transposition.
In fact, convolution operations can also be achieved by matrix
multiplication. In the example below, we define a :math:`3\times 3`
input :math:`X` with a :math:`2\times 2` kernel :math:`K`, and then use
``corr2d`` to compute the convolution output.
.. code:: python
X = np.arange(9).reshape(3, 3)
K = np.array([[0, 1], [2, 3]])
Y = d2l.corr2d(X, K)
Y
.. parsed-literal::
:class: output
array([[19., 25.],
[37., 43.]])
Next, we rewrite convolution kernel :math:`K` as a matrix :math:`W`. Its
shape will be :math:`(4, 9)`, where the :math:`i^\mathrm{th}` row
present applying the kernel to the input to generate the
:math:`i^\mathrm{th}` output element.
.. code:: python
def kernel2matrix(K):
k, W = np.zeros(5), np.zeros((4, 9))
k[:2], k[3:5] = K[0, :], K[1, :]
W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k
return W
W = kernel2matrix(K)
W
.. parsed-literal::
:class: output
array([[0., 1., 0., 2., 3., 0., 0., 0., 0.],
[0., 0., 1., 0., 2., 3., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 2., 3., 0.],
[0., 0., 0., 0., 0., 1., 0., 2., 3.]])
Then the convolution operator can be implemented by matrix
multiplication with proper reshaping.
.. code:: python
Y == np.dot(W, X.reshape(-1)).reshape(2, 2)
.. parsed-literal::
:class: output
array([[ True, True],
[ True, True]])
We can implement transposed convolution as a matrix multiplication as
well by reusing ``kernel2matrix``. To reuse the generated :math:`W`, we
construct a :math:`2\times 2` input, so the corresponding weight matrix
will have a shape :math:`(9, 4)`, which is :math:`W^\top`. Let us verify
the results.
.. code:: python
X = np.array([[0, 1], [2, 3]])
Y = trans_conv(X, K)
Y == np.dot(W.T, X.reshape(-1)).reshape(3, 3)
.. parsed-literal::
:class: output
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
Summary
-------
- Compared to convolutions that reduce inputs through kernels,
transposed convolutions broadcast inputs.
- If a convolution layer reduces the input width and height by
:math:`n_w` and :math:`h_h` time, respectively. Then a transposed
convolution layer with the same kernel sizes, padding and strides
will increase the input width and height by :math:`n_w` and
:math:`n_h`, respectively.
- We can implement convolution operations by the matrix multiplication,
the corresponding transposed convolutions can be done by transposed
matrix multiplication.
Exercises
---------
1. Is it efficient to use matrix multiplication to implement convolution
operations? Why?
`Discussions `__