.. _sec_linear-algebra:
Linear Algebra
==============
By now, we can load datasets into tensors and manipulate these tensors
with basic mathematical operations. To start building sophisticated
models, we will also need a few tools from linear algebra. This section
offers a gentle introduction to the most essential concepts, starting
from scalar arithmetic and ramping up to matrix multiplication.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import torch
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
from mxnet import np, npx
npx.set_np()
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
from jax import numpy as jnp
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
import tensorflow as tf
.. raw:: html
.. raw:: html
Scalars
-------
Most everyday mathematics consists of manipulating numbers one at a
time. Formally, we call these values *scalars*. For example, the
temperature in Palo Alto is a balmy :math:`72` degrees Fahrenheit. If
you wanted to convert the temperature to Celsius you would evaluate the
expression :math:`c = \frac{5}{9}(f - 32)`, setting :math:`f` to
:math:`72`. In this equation, the values :math:`5`, :math:`9`, and
:math:`32` are constant scalars. The variables :math:`c` and :math:`f`
in general represent unknown scalars.
We denote scalars by ordinary lower-cased letters (e.g., :math:`x`,
:math:`y`, and :math:`z`) and the space of all (continuous)
*real-valued* scalars by :math:`\mathbb{R}`. For expedience, we will
skip past rigorous definitions of *spaces*: just remember that the
expression :math:`x \in \mathbb{R}` is a formal way to say that
:math:`x` is a real-valued scalar. The symbol :math:`\in` (pronounced
“in”) denotes membership in a set. For example,
:math:`x, y \in \{0, 1\}` indicates that :math:`x` and :math:`y` are
variables that can only take values :math:`0` or :math:`1`.
Scalars are implemented as tensors that contain only one element. Below,
we assign two scalars and perform the familiar addition, multiplication,
division, and exponentiation operations.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = torch.tensor(3.0)
y = torch.tensor(2.0)
x + y, x * y, x / y, x**y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor(5.), tensor(6.), tensor(1.5000), tensor(9.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = np.array(3.0)
y = np.array(2.0)
x + y, x * y, x / y, x ** y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
[21:50:12] ../src/storage/storage.cc:196: Using Pooled (Naive) StorageManager for CPU
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array(5.), array(6.), array(1.5), array(9.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = jnp.array(3.0)
y = jnp.array(2.0)
x + y, x * y, x / y, x**y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array(5., dtype=float32, weak_type=True),
Array(6., dtype=float32, weak_type=True),
Array(1.5, dtype=float32, weak_type=True),
Array(9., dtype=float32, weak_type=True))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = tf.constant(3.0)
y = tf.constant(2.0)
x + y, x * y, x / y, x**y
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
,
,
)
.. raw:: html
.. raw:: html
Vectors
-------
For current purposes, you can think of a vector as a fixed-length array
of scalars. As with their code counterparts, we call these scalars the
*elements* of the vector (synonyms include *entries* and *components*).
When vectors represent examples from real-world datasets, their values
hold some real-world significance. For example, if we were training a
model to predict the risk of a loan defaulting, we might associate each
applicant with a vector whose components correspond to quantities like
their income, length of employment, or number of previous defaults. If
we were studying the risk of heart attack, each vector might represent a
patient and its components might correspond to their most recent vital
signs, cholesterol levels, minutes of exercise per day, etc. We denote
vectors by bold lowercase letters, (e.g., :math:`\mathbf{x}`,
:math:`\mathbf{y}`, and :math:`\mathbf{z}`).
Vectors are implemented as :math:`1^{\textrm{st}}`-order tensors. In
general, such tensors can have arbitrary lengths, subject to memory
limitations. Caution: in Python, as in most programming languages,
vector indices start at :math:`0`, also known as *zero-based indexing*,
whereas in linear algebra subscripts begin at :math:`1` (one-based
indexing).
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = torch.arange(3)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([0, 1, 2])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = np.arange(3)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([0., 1., 2.])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = jnp.arange(3)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([0, 1, 2], dtype=int32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = tf.range(3)
x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
We can refer to an element of a vector by using a subscript. For
example, :math:`x_2` denotes the second element of :math:`\mathbf{x}`.
Since :math:`x_2` is a scalar, we do not bold it. By default, we
visualize vectors by stacking their elements vertically.
.. math:: \mathbf{x} =\begin{bmatrix}x_{1} \\ \vdots \\x_{n}\end{bmatrix},
:label: eq_vec_def
Here :math:`x_1, \ldots, x_n` are elements of the vector. Later on, we
will distinguish between such *column vectors* and *row vectors* whose
elements are stacked horizontally. Recall that we access a tensor’s
elements via indexing.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x[2]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(2)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x[2]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(2.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x[2]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array(2, dtype=int32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x[2]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
To indicate that a vector contains :math:`n` elements, we write
:math:`\mathbf{x} \in \mathbb{R}^n`. Formally, we call :math:`n` the
*dimensionality* of the vector. In code, this corresponds to the
tensor’s length, accessible via Python’s built-in ``len`` function.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
len(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
3
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
len(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
3
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
len(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
3
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
len(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
3
.. raw:: html
.. raw:: html
We can also access the length via the ``shape`` attribute. The shape is
a tuple that indicates a tensor’s length along each axis. Tensors with
just one axis have shapes with just one element.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
torch.Size([3])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(3,)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(3,)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
TensorShape([3])
.. raw:: html
.. raw:: html
Oftentimes, the word “dimension” gets overloaded to mean both the number
of axes and the length along a particular axis. To avoid this confusion,
we use *order* to refer to the number of axes and *dimensionality*
exclusively to refer to the number of components.
Matrices
--------
Just as scalars are :math:`0^{\textrm{th}}`-order tensors and vectors
are :math:`1^{\textrm{st}}`-order tensors, matrices are
:math:`2^{\textrm{nd}}`-order tensors. We denote matrices by bold
capital letters (e.g., :math:`\mathbf{X}`, :math:`\mathbf{Y}`, and
:math:`\mathbf{Z}`), and represent them in code by tensors with two
axes. The expression :math:`\mathbf{A} \in \mathbb{R}^{m \times n}`
indicates that a matrix :math:`\mathbf{A}` contains :math:`m \times n`
real-valued scalars, arranged as :math:`m` rows and :math:`n` columns.
When :math:`m = n`, we say that a matrix is *square*. Visually, we can
illustrate any matrix as a table. To refer to an individual element, we
subscript both the row and column indices, e.g., :math:`a_{ij}` is the
value that belongs to :math:`\mathbf{A}`\ ’s :math:`i^{\textrm{th}}` row
and :math:`j^{\textrm{th}}` column:
.. math:: \mathbf{A}=\begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix}.
:label: eq_matrix_def
In code, we represent a matrix
:math:`\mathbf{A} \in \mathbb{R}^{m \times n}` by a
:math:`2^{\textrm{nd}}`-order tensor with shape (:math:`m`, :math:`n`).
We can convert any appropriately sized :math:`m \times n` tensor into an
:math:`m \times n` matrix by passing the desired shape to ``reshape``:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = torch.arange(6).reshape(3, 2)
A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[0, 1],
[2, 3],
[4, 5]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = np.arange(6).reshape(3, 2)
A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0., 1.],
[2., 3.],
[4., 5.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = jnp.arange(6).reshape(3, 2)
A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[0, 1],
[2, 3],
[4, 5]], dtype=int32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = tf.reshape(tf.range(6), (3, 2))
A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Sometimes we want to flip the axes. When we exchange a matrix’s rows and
columns, the result is called its *transpose*. Formally, we signify a
matrix :math:`\mathbf{A}`\ ’s transpose by :math:`\mathbf{A}^\top` and
if :math:`\mathbf{B} = \mathbf{A}^\top`, then :math:`b_{ij} = a_{ji}`
for all :math:`i` and :math:`j`. Thus, the transpose of an
:math:`m \times n` matrix is an :math:`n \times m` matrix:
.. math::
\mathbf{A}^\top =
\begin{bmatrix}
a_{11} & a_{21} & \dots & a_{m1} \\
a_{12} & a_{22} & \dots & a_{m2} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1n} & a_{2n} & \dots & a_{mn}
\end{bmatrix}.
In code, we can access any matrix’s transpose as follows:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[0, 2, 4],
[1, 3, 5]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0., 2., 4.],
[1., 3., 5.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[0, 2, 4],
[1, 3, 5]], dtype=int32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.transpose(A)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Symmetric matrices are the subset of square matrices that are equal to
their own transposes: :math:`\mathbf{A} = \mathbf{A}^\top`. The
following matrix is symmetric:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[True, True, True],
[True, True, True],
[True, True, True]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = np.array([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = jnp.array([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == A.T
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = tf.constant([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
A == tf.transpose(A)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Matrices are useful for representing datasets. Typically, rows
correspond to individual records and columns correspond to distinct
attributes.
Tensors
-------
While you can go far in your machine learning journey with only scalars,
vectors, and matrices, eventually you may need to work with higher-order
tensors. Tensors give us a generic way of describing extensions to
:math:`n^{\textrm{th}}`-order arrays. We call software objects of the
*tensor class* “tensors” precisely because they too can have arbitrary
numbers of axes. While it may be confusing to use the word *tensor* for
both the mathematical object and its realization in code, our meaning
should usually be clear from context. We denote general tensors by
capital letters with a special font face (e.g., :math:`\mathsf{X}`,
:math:`\mathsf{Y}`, and :math:`\mathsf{Z}`) and their indexing mechanism
(e.g., :math:`x_{ijk}` and :math:`[\mathsf{X}]_{1, 2i-1, 3}`) follows
naturally from that of matrices.
Tensors will become more important when we start working with images.
Each image arrives as a :math:`3^{\textrm{rd}}`-order tensor with axes
corresponding to the height, width, and *channel*. At each spatial
location, the intensities of each color (red, green, and blue) are
stacked along the channel. Furthermore, a collection of images is
represented in code by a :math:`4^{\textrm{th}}`-order tensor, where
distinct images are indexed along the first axis. Higher-order tensors
are constructed, as were vectors and matrices, by growing the number of
shape components.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.arange(24).reshape(2, 3, 4)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.arange(24).reshape(2, 3, 4)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]],
[[12., 13., 14., 15.],
[16., 17., 18., 19.],
[20., 21., 22., 23.]]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
jnp.arange(24).reshape(2, 3, 4)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]], dtype=int32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reshape(tf.range(24), (2, 3, 4))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Basic Properties of Tensor Arithmetic
-------------------------------------
Scalars, vectors, matrices, and higher-order tensors all have some handy
properties. For example, elementwise operations produce outputs that
have the same shape as their operands.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = torch.arange(6, dtype=torch.float32).reshape(2, 3)
B = A.clone() # Assign a copy of A to B by allocating new memory
A, A + B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[0., 1., 2.],
[3., 4., 5.]]),
tensor([[ 0., 2., 4.],
[ 6., 8., 10.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = np.arange(6).reshape(2, 3)
B = A.copy() # Assign a copy of A to B by allocating new memory
A, A + B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([[0., 1., 2.],
[3., 4., 5.]]),
array([[ 0., 2., 4.],
[ 6., 8., 10.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = jnp.arange(6, dtype=jnp.float32).reshape(2, 3)
B = A
A, A + B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array([[0., 1., 2.],
[3., 4., 5.]], dtype=float32),
Array([[ 0., 2., 4.],
[ 6., 8., 10.]], dtype=float32))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A = tf.reshape(tf.range(6, dtype=tf.float32), (2, 3))
B = A # No cloning of A to B by allocating new memory
A, A + B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
The elementwise product of two matrices is called their *Hadamard
product* (denoted :math:`\odot`). We can spell out the entries of the
Hadamard product of two matrices
:math:`\mathbf{A}, \mathbf{B} \in \mathbb{R}^{m \times n}`:
.. math::
\mathbf{A} \odot \mathbf{B} =
\begin{bmatrix}
a_{11} b_{11} & a_{12} b_{12} & \dots & a_{1n} b_{1n} \\
a_{21} b_{21} & a_{22} b_{22} & \dots & a_{2n} b_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} b_{m1} & a_{m2} b_{m2} & \dots & a_{mn} b_{mn}
\end{bmatrix}.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A * B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[ 0., 1., 4.],
[ 9., 16., 25.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A * B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 0., 1., 4.],
[ 9., 16., 25.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A * B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[ 0., 1., 4.],
[ 9., 16., 25.]], dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A * B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Adding or multiplying a scalar and a tensor produces a result with the
same shape as the original tensor. Here, each element of the tensor is
added to (or multiplied by) the scalar.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = 2
X = torch.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[[ 2, 3, 4, 5],
[ 6, 7, 8, 9],
[10, 11, 12, 13]],
[[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]]),
torch.Size([2, 3, 4]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = 2
X = np.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([[[ 2., 3., 4., 5.],
[ 6., 7., 8., 9.],
[10., 11., 12., 13.]],
[[14., 15., 16., 17.],
[18., 19., 20., 21.],
[22., 23., 24., 25.]]]),
(2, 3, 4))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = 2
X = jnp.arange(24).reshape(2, 3, 4)
a + X, (a * X).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array([[[ 2, 3, 4, 5],
[ 6, 7, 8, 9],
[10, 11, 12, 13]],
[[14, 15, 16, 17],
[18, 19, 20, 21],
[22, 23, 24, 25]]], dtype=int32),
(2, 3, 4))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
a = 2
X = tf.reshape(tf.range(24), (2, 3, 4))
a + X, (a * X).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
TensorShape([2, 3, 4]))
.. raw:: html
.. raw:: html
.. _subsec_lin-alg-reduction:
Reduction
---------
Often, we wish to calculate the sum of a tensor’s elements. To express
the sum of the elements in a vector :math:`\mathbf{x}` of length
:math:`n`, we write :math:`\sum_{i=1}^n x_i`. There is a simple function
for it:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = torch.arange(3, dtype=torch.float32)
x, x.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([0., 1., 2.]), tensor(3.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = np.arange(3)
x, x.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([0., 1., 2.]), array(3.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = jnp.arange(3, dtype=jnp.float32)
x, x.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array([0., 1., 2.], dtype=float32), Array(3., dtype=float32))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
x = tf.range(3, dtype=tf.float32)
x, tf.reduce_sum(x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
To express sums over the elements of tensors of arbitrary shape, we
simply sum over all its axes. For example, the sum of the elements of an
:math:`m \times n` matrix :math:`\mathbf{A}` could be written
:math:`\sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}`.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(torch.Size([2, 3]), tensor(15.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), array(15.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), Array(15., dtype=float32))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, tf.reduce_sum(A)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(TensorShape([2, 3]), )
.. raw:: html
.. raw:: html
By default, invoking the sum function *reduces* a tensor along all of
its axes, eventually producing a scalar. Our libraries also allow us to
specify the axes along which the tensor should be reduced. To sum over
all elements along the rows (axis 0), we specify ``axis=0`` in ``sum``.
Since the input matrix reduces along axis 0 to generate the output
vector, this axis is missing from the shape of the output.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum(axis=0).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(torch.Size([2, 3]), torch.Size([3]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum(axis=0).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), (3,))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum(axis=0).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), (3,))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, tf.reduce_sum(A, axis=0).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(TensorShape([2, 3]), TensorShape([3]))
.. raw:: html
.. raw:: html
Specifying ``axis=1`` will reduce the column dimension (axis 1) by
summing up elements of all the columns.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum(axis=1).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(torch.Size([2, 3]), torch.Size([2]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum(axis=1).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), (2,))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, A.sum(axis=1).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), (2,))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, tf.reduce_sum(A, axis=1).shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(TensorShape([2, 3]), TensorShape([2]))
.. raw:: html
.. raw:: html
Reducing a matrix along both rows and columns via summation is
equivalent to summing up all the elements of the matrix.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.sum(axis=[0, 1]) == A.sum() # Same as A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(True)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.sum(axis=[0, 1]) == A.sum() # Same as A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(True)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.sum(axis=[0, 1]) == A.sum() # Same as A.sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array(True, dtype=bool)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_sum(A, axis=[0, 1]), tf.reduce_sum(A) # Same as tf.reduce_sum(A)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
A related quantity is the *mean*, also called the *average*. We
calculate the mean by dividing the sum by the total number of elements.
Because computing the mean is so common, it gets a dedicated library
function that works analogously to ``sum``.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(), A.sum() / A.numel()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor(2.5000), tensor(2.5000))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(), A.sum() / A.size
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array(2.5), array(2.5))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(), A.sum() / A.size
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array(2.5, dtype=float32), Array(2.5, dtype=float32))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_mean(A), tf.reduce_sum(A) / tf.size(A).numpy()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
Likewise, the function for calculating the mean can also reduce a tensor
along specific axes.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(axis=0), A.sum(axis=0) / A.shape[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([1.5000, 2.5000, 3.5000]), tensor([1.5000, 2.5000, 3.5000]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(axis=0), A.sum(axis=0) / A.shape[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([1.5, 2.5, 3.5]), array([1.5, 2.5, 3.5]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.mean(axis=0), A.sum(axis=0) / A.shape[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array([1.5, 2.5, 3.5], dtype=float32), Array([1.5, 2.5, 3.5], dtype=float32))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_mean(A, axis=0), tf.reduce_sum(A, axis=0) / A.shape[0]
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
)
.. raw:: html
.. raw:: html
.. _subsec_lin-alg-non-reduction:
Non-Reduction Sum
-----------------
Sometimes it can be useful to keep the number of axes unchanged when
invoking the function for calculating the sum or mean. This matters when
we want to use the broadcast mechanism.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
sum_A = A.sum(axis=1, keepdims=True)
sum_A, sum_A.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[ 3.],
[12.]]),
torch.Size([2, 1]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
sum_A = A.sum(axis=1, keepdims=True)
sum_A, sum_A.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([[ 3.],
[12.]]),
(2, 1))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
sum_A = A.sum(axis=1, keepdims=True)
sum_A, sum_A.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array([[ 3.],
[12.]], dtype=float32),
(2, 1))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
sum_A = tf.reduce_sum(A, axis=1, keepdims=True)
sum_A, sum_A.shape
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
TensorShape([2, 1]))
.. raw:: html
.. raw:: html
For instance, since ``sum_A`` keeps its two axes after summing each row,
we can divide ``A`` by ``sum_A`` with broadcasting to create a matrix
where each row sums up to :math:`1`.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A / sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[0.0000, 0.3333, 0.6667],
[0.2500, 0.3333, 0.4167]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A / sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0. , 0.33333334, 0.6666667 ],
[0.25 , 0.33333334, 0.41666666]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A / sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[0. , 0.33333334, 0.6666667 ],
[0.25 , 0.33333334, 0.41666666]], dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A / sum_A
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
If we want to calculate the cumulative sum of elements of ``A`` along
some axis, say ``axis=0`` (row by row), we can call the ``cumsum``
function. By design, this function does not reduce the input tensor
along any axis.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.cumsum(axis=0)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor([[0., 1., 2.],
[3., 5., 7.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.cumsum(axis=0)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[0., 1., 2.],
[3., 5., 7.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.cumsum(axis=0)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[0., 1., 2.],
[3., 5., 7.]], dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.cumsum(A, axis=0)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Dot Products
------------
So far, we have only performed elementwise operations, sums, and
averages. And if this was all we could do, linear algebra would not
deserve its own section. Fortunately, this is where things get more
interesting. One of the most fundamental operations is the dot product.
Given two vectors :math:`\mathbf{x}, \mathbf{y} \in \mathbb{R}^d`, their
*dot product* :math:`\mathbf{x}^\top \mathbf{y}` (also known as *inner
product*, :math:`\langle \mathbf{x}, \mathbf{y} \rangle`) is a sum over
the products of the elements at the same position:
:math:`\mathbf{x}^\top \mathbf{y} = \sum_{i=1}^{d} x_i y_i`.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
y = torch.ones(3, dtype = torch.float32)
x, y, torch.dot(x, y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([0., 1., 2.]), tensor([1., 1., 1.]), tensor(3.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
y = np.ones(3)
x, y, np.dot(x, y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(array([0., 1., 2.]), array([1., 1., 1.]), array(3.))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
y = jnp.ones(3, dtype = jnp.float32)
x, y, jnp.dot(x, y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(Array([0., 1., 2.], dtype=float32),
Array([1., 1., 1.], dtype=float32),
Array(3., dtype=float32))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
y = tf.ones(3, dtype=tf.float32)
x, y, tf.tensordot(x, y, axes=1)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(,
,
)
.. raw:: html
.. raw:: html
Equivalently, we can calculate the dot product of two vectors by
performing an elementwise multiplication followed by a sum:
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.sum(x * y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(3.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.sum(x * y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(3.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
jnp.sum(x * y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array(3., dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_sum(x * y)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Dot products are useful in a wide range of contexts. For example, given
some set of values, denoted by a vector
:math:`\mathbf{x} \in \mathbb{R}^n`, and a set of weights, denoted by
:math:`\mathbf{w} \in \mathbb{R}^n`, the weighted sum of the values in
:math:`\mathbf{x}` according to the weights :math:`\mathbf{w}` could be
expressed as the dot product :math:`\mathbf{x}^\top \mathbf{w}`. When
the weights are nonnegative and sum to :math:`1`, i.e.,
:math:`\left(\sum_{i=1}^{n} {w_i} = 1\right)`, the dot product expresses
a *weighted average*. After normalizing two vectors to have unit length,
the dot products express the cosine of the angle between them. Later in
this section, we will formally introduce this notion of *length*.
Matrix–Vector Products
----------------------
Now that we know how to calculate dot products, we can begin to
understand the *product* between an :math:`m \times n` matrix
:math:`\mathbf{A}` and an :math:`n`-dimensional vector
:math:`\mathbf{x}`. To start off, we visualize our matrix in terms of
its row vectors
.. math::
\mathbf{A}=
\begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_m \\
\end{bmatrix},
where each :math:`\mathbf{a}^\top_{i} \in \mathbb{R}^n` is a row vector
representing the :math:`i^\textrm{th}` row of the matrix
:math:`\mathbf{A}`.
The matrix–vector product :math:`\mathbf{A}\mathbf{x}` is simply a
column vector of length :math:`m`, whose :math:`i^\textrm{th}` element
is the dot product :math:`\mathbf{a}^\top_i \mathbf{x}`:
.. math::
\mathbf{A}\mathbf{x}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_m \\
\end{bmatrix}\mathbf{x}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \mathbf{x} \\
\mathbf{a}^\top_{2} \mathbf{x} \\
\vdots\\
\mathbf{a}^\top_{m} \mathbf{x}\\
\end{bmatrix}.
We can think of multiplication with a matrix
:math:`\mathbf{A}\in \mathbb{R}^{m \times n}` as a transformation that
projects vectors from :math:`\mathbb{R}^{n}` to :math:`\mathbb{R}^{m}`.
These transformations are remarkably useful. For example, we can
represent rotations as multiplications by certain square matrices.
Matrix–vector products also describe the key calculation involved in
computing the outputs of each layer in a neural network given the
outputs from the previous layer.
.. raw:: html
.. raw:: html
To express a matrix–vector product in code, we use the ``mv`` function.
Note that the column dimension of ``A`` (its length along axis 1) must
be the same as the dimension of ``x`` (its length). Python has a
convenience operator ``@`` that can execute both matrix–vector and
matrix–matrix products (depending on its arguments). Thus we can write
``A@x``.
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, x.shape, torch.mv(A, x), A@x
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(torch.Size([2, 3]), torch.Size([3]), tensor([ 5., 14.]), tensor([ 5., 14.]))
.. raw:: html
.. raw:: html
To express a matrix–vector product in code, we use the same ``dot``
function. The operation is inferred based on the type of the arguments.
Note that the column dimension of ``A`` (its length along axis 1) must
be the same as the dimension of ``x`` (its length).
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, x.shape, np.dot(A, x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), (3,), array([ 5., 14.]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, x.shape, jnp.matmul(A, x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
((2, 3), (3,), Array([ 5., 14.], dtype=float32))
.. raw:: html
.. raw:: html
To express a matrix–vector product in code, we use the ``matvec``
function. Note that the column dimension of ``A`` (its length along axis
1) must be the same as the dimension of ``x`` (its length).
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
A.shape, x.shape, tf.linalg.matvec(A, x)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(TensorShape([2, 3]),
TensorShape([3]),
)
.. raw:: html
.. raw:: html
Matrix–Matrix Multiplication
----------------------------
Once you have gotten the hang of dot products and matrix–vector
products, then *matrix–matrix multiplication* should be straightforward.
Say that we have two matrices
:math:`\mathbf{A} \in \mathbb{R}^{n \times k}` and
:math:`\mathbf{B} \in \mathbb{R}^{k \times m}`:
.. math::
\mathbf{A}=\begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1k} \\
a_{21} & a_{22} & \cdots & a_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nk} \\
\end{bmatrix},\quad
\mathbf{B}=\begin{bmatrix}
b_{11} & b_{12} & \cdots & b_{1m} \\
b_{21} & b_{22} & \cdots & b_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
b_{k1} & b_{k2} & \cdots & b_{km} \\
\end{bmatrix}.
Let :math:`\mathbf{a}^\top_{i} \in \mathbb{R}^k` denote the row vector
representing the :math:`i^\textrm{th}` row of the matrix
:math:`\mathbf{A}` and let :math:`\mathbf{b}_{j} \in \mathbb{R}^k`
denote the column vector from the :math:`j^\textrm{th}` column of the
matrix :math:`\mathbf{B}`:
.. math::
\mathbf{A}=
\begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_n \\
\end{bmatrix},
\quad \mathbf{B}=\begin{bmatrix}
\mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{m} \\
\end{bmatrix}.
To form the matrix product
:math:`\mathbf{C} \in \mathbb{R}^{n \times m}`, we simply compute each
element :math:`c_{ij}` as the dot product between the
:math:`i^{\textrm{th}}` row of :math:`\mathbf{A}` and the
:math:`j^{\textrm{th}}` column of :math:`\mathbf{B}`, i.e.,
:math:`\mathbf{a}^\top_i \mathbf{b}_j`:
.. math::
\mathbf{C} = \mathbf{AB} = \begin{bmatrix}
\mathbf{a}^\top_{1} \\
\mathbf{a}^\top_{2} \\
\vdots \\
\mathbf{a}^\top_n \\
\end{bmatrix}
\begin{bmatrix}
\mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{m} \\
\end{bmatrix}
= \begin{bmatrix}
\mathbf{a}^\top_{1} \mathbf{b}_1 & \mathbf{a}^\top_{1}\mathbf{b}_2& \cdots & \mathbf{a}^\top_{1} \mathbf{b}_m \\
\mathbf{a}^\top_{2}\mathbf{b}_1 & \mathbf{a}^\top_{2} \mathbf{b}_2 & \cdots & \mathbf{a}^\top_{2} \mathbf{b}_m \\
\vdots & \vdots & \ddots &\vdots\\
\mathbf{a}^\top_{n} \mathbf{b}_1 & \mathbf{a}^\top_{n}\mathbf{b}_2& \cdots& \mathbf{a}^\top_{n} \mathbf{b}_m
\end{bmatrix}.
We can think of the matrix–matrix multiplication :math:`\mathbf{AB}` as
performing :math:`m` matrix–vector products or :math:`m \times n` dot
products and stitching the results together to form an
:math:`n \times m` matrix. In the following snippet, we perform matrix
multiplication on ``A`` and ``B``. Here, ``A`` is a matrix with two rows
and three columns, and ``B`` is a matrix with three rows and four
columns. After multiplication, we obtain a matrix with two rows and four
columns.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = torch.ones(3, 4)
torch.mm(A, B), A@B
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
(tensor([[ 3., 3., 3., 3.],
[12., 12., 12., 12.]]),
tensor([[ 3., 3., 3., 3.],
[12., 12., 12., 12.]]))
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = np.ones(shape=(3, 4))
np.dot(A, B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array([[ 3., 3., 3., 3.],
[12., 12., 12., 12.]])
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = jnp.ones((3, 4))
jnp.matmul(A, B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array([[ 3., 3., 3., 3.],
[12., 12., 12., 12.]], dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
B = tf.ones((3, 4), tf.float32)
tf.matmul(A, B)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
The term *matrix–matrix multiplication* is often simplified to *matrix
multiplication*, and should not be confused with the Hadamard product.
.. _subsec_lin-algebra-norms:
Norms
-----
Some of the most useful operators in linear algebra are *norms*.
Informally, the norm of a vector tells us how *big* it is. For instance,
the :math:`\ell_2` norm measures the (Euclidean) length of a vector.
Here, we are employing a notion of *size* that concerns the magnitude of
a vector’s components (not its dimensionality).
A norm is a function :math:`\| \cdot \|` that maps a vector to a scalar
and satisfies the following three properties:
1. Given any vector :math:`\mathbf{x}`, if we scale (all elements of)
the vector by a scalar :math:`\alpha \in \mathbb{R}`, its norm scales
accordingly:
.. math:: \|\alpha \mathbf{x}\| = |\alpha| \|\mathbf{x}\|.
2. For any vectors :math:`\mathbf{x}` and :math:`\mathbf{y}`: norms
satisfy the triangle inequality:
.. math:: \|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|.
3. The norm of a vector is nonnegative and it only vanishes if the
vector is zero:
.. math:: \|\mathbf{x}\| > 0 \textrm{ for all } \mathbf{x} \neq 0.
Many functions are valid norms and different norms encode different
notions of size. The Euclidean norm that we all learned in elementary
school geometry when calculating the hypotenuse of a right triangle is
the square root of the sum of squares of a vector’s elements. Formally,
this is called the :math:`\ell_2` *norm* and expressed as
.. math:: \|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2}.
The method ``norm`` calculates the :math:`\ell_2` norm.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
u = torch.tensor([3.0, -4.0])
torch.norm(u)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(5.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
u = np.array([3, -4])
np.linalg.norm(u)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(5.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
u = jnp.array([3.0, -4.0])
jnp.linalg.norm(u)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array(5., dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
u = tf.constant([3.0, -4.0])
tf.norm(u)
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
The :math:`\ell_1` norm is also common and the associated measure is
called the Manhattan distance. By definition, the :math:`\ell_1` norm
sums the absolute values of a vector’s elements:
.. math:: \|\mathbf{x}\|_1 = \sum_{i=1}^n \left|x_i \right|.
Compared to the :math:`\ell_2` norm, it is less sensitive to outliers.
To compute the :math:`\ell_1` norm, we compose the absolute value with
the sum operation.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.abs(u).sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(7.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.abs(u).sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(7.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
jnp.linalg.norm(u, ord=1) # same as jnp.abs(u).sum()
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array(7., dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.reduce_sum(tf.abs(u))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
Both the :math:`\ell_2` and :math:`\ell_1` norms are special cases of
the more general :math:`\ell_p` *norms*:
.. math:: \|\mathbf{x}\|_p = \left(\sum_{i=1}^n \left|x_i \right|^p \right)^{1/p}.
In the case of matrices, matters are more complicated. After all,
matrices can be viewed both as collections of individual entries *and*
as objects that operate on vectors and transform them into other
vectors. For instance, we can ask by how much longer the matrix–vector
product :math:`\mathbf{X} \mathbf{v}` could be relative to
:math:`\mathbf{v}`. This line of thought leads to what is called the
*spectral* norm. For now, we introduce the *Frobenius norm*, which is
much easier to compute and defined as the square root of the sum of the
squares of a matrix’s elements:
.. math:: \|\mathbf{X}\|_\textrm{F} = \sqrt{\sum_{i=1}^m \sum_{j=1}^n x_{ij}^2}.
The Frobenius norm behaves as if it were an :math:`\ell_2` norm of a
matrix-shaped vector. Invoking the following function will calculate the
Frobenius norm of a matrix.
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
torch.norm(torch.ones((4, 9)))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
tensor(6.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
np.linalg.norm(np.ones((4, 9)))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
array(6.)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
jnp.linalg.norm(jnp.ones((4, 9)))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
Array(6., dtype=float32)
.. raw:: html
.. raw:: html
.. raw:: latex
\diilbookstyleinputcell
.. code:: python
tf.norm(tf.ones((4, 9)))
.. raw:: latex
\diilbookstyleoutputcell
.. parsed-literal::
:class: output
.. raw:: html
.. raw:: html
While we do not want to get too far ahead of ourselves, we already can
plant some intuition about why these concepts are useful. In deep
learning, we are often trying to solve optimization problems: *maximize*
the probability assigned to observed data; *maximize* the revenue
associated with a recommender model; *minimize* the distance between
predictions and the ground truth observations; *minimize* the distance
between representations of photos of the same person while *maximizing*
the distance between representations of photos of different people.
These distances, which constitute the objectives of deep learning
algorithms, are often expressed as norms.
Discussion
----------
In this section, we have reviewed all the linear algebra that you will
need to understand a significant chunk of modern deep learning. There is
a lot more to linear algebra, though, and much of it is useful for
machine learning. For example, matrices can be decomposed into factors,
and these decompositions can reveal low-dimensional structure in
real-world datasets. There are entire subfields of machine learning that
focus on using matrix decompositions and their generalizations to
high-order tensors to discover structure in datasets and solve
prediction problems. But this book focuses on deep learning. And we
believe you will be more inclined to learn more mathematics once you
have gotten your hands dirty applying machine learning to real datasets.
So while we reserve the right to introduce more mathematics later on, we
wrap up this section here.
If you are eager to learn more linear algebra, there are many excellent
books and online resources. For a more advanced crash course, consider
checking out :cite:t:`Strang.1993`, :cite:t:`Kolter.2008`, and
:cite:t:`Petersen.Pedersen.ea.2008`.
To recap:
- Scalars, vectors, matrices, and tensors are the basic mathematical
objects used in linear algebra and have zero, one, two, and an
arbitrary number of axes, respectively.
- Tensors can be sliced or reduced along specified axes via indexing,
or operations such as ``sum`` and ``mean``, respectively.
- Elementwise products are called Hadamard products. By contrast, dot
products, matrix–vector products, and matrix–matrix products are not
elementwise operations and in general return objects having shapes
that are different from the the operands.
- Compared to Hadamard products, matrix–matrix products take
considerably longer to compute (cubic rather than quadratic time).
- Norms capture various notions of the magnitude of a vector (or
matrix), and are commonly applied to the difference of two vectors to
measure their distance apart.
- Common vector norms include the :math:`\ell_1` and :math:`\ell_2`
norms, and common matrix norms include the *spectral* and *Frobenius*
norms.
Exercises
---------
1. Prove that the transpose of the transpose of a matrix is the matrix
itself: :math:`(\mathbf{A}^\top)^\top = \mathbf{A}`.
2. Given two matrices :math:`\mathbf{A}` and :math:`\mathbf{B}`, show
that sum and transposition commute:
:math:`\mathbf{A}^\top + \mathbf{B}^\top = (\mathbf{A} + \mathbf{B})^\top`.
3. Given any square matrix :math:`\mathbf{A}`, is
:math:`\mathbf{A} + \mathbf{A}^\top` always symmetric? Can you prove
the result by using only the results of the previous two exercises?
4. We defined the tensor ``X`` of shape (2, 3, 4) in this section. What
is the output of ``len(X)``? Write your answer without implementing
any code, then check your answer using code.
5. For a tensor ``X`` of arbitrary shape, does ``len(X)`` always
correspond to the length of a certain axis of ``X``? What is that
axis?
6. Run ``A / A.sum(axis=1)`` and see what happens. Can you analyze the
results?
7. When traveling between two points in downtown Manhattan, what is the
distance that you need to cover in terms of the coordinates, i.e.,
in terms of avenues and streets? Can you travel diagonally?
8. Consider a tensor of shape (2, 3, 4). What are the shapes of the
summation outputs along axes 0, 1, and 2?
9. Feed a tensor with three or more axes to the ``linalg.norm``
function and observe its output. What does this function compute for
tensors of arbitrary shape?
10. Consider three large matrices, say
:math:`\mathbf{A} \in \mathbb{R}^{2^{10} \times 2^{16}}`,
:math:`\mathbf{B} \in \mathbb{R}^{2^{16} \times 2^{5}}` and
:math:`\mathbf{C} \in \mathbb{R}^{2^{5} \times 2^{14}}`, initialized
with Gaussian random variables. You want to compute the product
:math:`\mathbf{A} \mathbf{B} \mathbf{C}`. Is there any difference in
memory footprint and speed, depending on whether you compute
:math:`(\mathbf{A} \mathbf{B}) \mathbf{C}` or
:math:`\mathbf{A} (\mathbf{B} \mathbf{C})`. Why?
11. Consider three large matrices, say
:math:`\mathbf{A} \in \mathbb{R}^{2^{10} \times 2^{16}}`,
:math:`\mathbf{B} \in \mathbb{R}^{2^{16} \times 2^{5}}` and
:math:`\mathbf{C} \in \mathbb{R}^{2^{5} \times 2^{16}}`. Is there
any difference in speed depending on whether you compute
:math:`\mathbf{A} \mathbf{B}` or :math:`\mathbf{A} \mathbf{C}^\top`?
Why? What changes if you initialize
:math:`\mathbf{C} = \mathbf{B}^\top` without cloning memory? Why?
12. Consider three matrices, say
:math:`\mathbf{A}, \mathbf{B}, \mathbf{C} \in \mathbb{R}^{100 \times 200}`.
Construct a tensor with three axes by stacking
:math:`[\mathbf{A}, \mathbf{B}, \mathbf{C}]`. What is the
dimensionality? Slice out the second coordinate of the third axis to
recover :math:`\mathbf{B}`. Check that your answer is correct.
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html
`Discussions `__
.. raw:: html
.. raw:: html