The notation used throughout this book is summarized below.


  • \(x\): A scalar

  • \(\mathbf{x}\): A vector

  • \(\mathbf{X}\): A matrix

  • \(\mathsf{X}\): A tensor

  • \(\mathbf{I}\): An identity matrix

  • \(x_i\), \([\mathbf{x}]_i\): The \(i^\mathrm{th}\) element of vector \(\mathbf{x}\)

  • \(x_{ij}\), \(x_{i,j}\),\([\mathbf{X}]_{ij}\), \([\mathbf{X}]_{i,j}\): The element of matrix \(\mathbf{X}\) at row \(i\) and column \(j\)

Set Theory

  • \(\mathcal{X}\): A set

  • \(\mathbb{Z}\): The set of integers

  • \(\mathbb{Z}^+\): The set of positive integers

  • \(\mathbb{R}\): The set of real numbers

  • \(\mathbb{R}^n\): The set of \(n\)-dimensional vectors of real numbers

  • \(\mathbb{R}^{a\times b}\): The set of matrices of real numbers with \(a\) rows and \(b\) columns

  • \(|\mathcal{X}|\): Cardinality (number of elements) of set \(\mathcal{X}\)

  • \(\mathcal{A}\cup\mathcal{B}\): Union of sets \(\mathcal{A}\) and \(\mathcal{B}\)

  • \(\mathcal{A}\cap\mathcal{B}\): Intersection of sets \(\mathcal{A}\) and \(\mathcal{B}\)

  • \(\mathcal{A}\setminus\mathcal{B}\): Subtraction of set \(\mathcal{B}\) from set \(\mathcal{A}\)

Functions and Operators

  • \(f(\cdot)\): A function

  • \(\log(\cdot)\): The natural logarithm

  • \(\exp(\cdot)\): The exponential function

  • \(\mathbf{1}_\mathcal{X}\): The indicator function

  • \(\mathbf{(\cdot)}^\top\): Transpose of a vector or a matrix

  • \(\mathbf{X}^{-1}\): Inverse of matrix \(\mathbf{X}\)

  • \(\odot\): Hadamard (elementwise) product

  • \([\cdot, \cdot]\): Concatenation

  • \(\lvert \mathcal{X} \rvert\): Cardinality of set \(\mathcal{X}\)

  • \(\|\cdot\|_p\): \(L_p\) norm

  • \(\|\cdot\|\): \(L_2\) norm

  • \(\langle \mathbf{x}, \mathbf{y} \rangle\): Dot product of vectors \(\mathbf{x}\) and \(\mathbf{y}\)

  • \(\sum\): Series addition

  • \(\prod\): Series multiplication

  • \(\stackrel{\mathrm{def}}{=}\): Definition


  • \(\frac{dy}{dx}\): Derivative of \(y\) with respect to \(x\)

  • \(\frac{\partial y}{\partial x}\): Partial derivative of \(y\) with respect to \(x\)

  • \(\nabla_{\mathbf{x}} y\): Gradient of \(y\) with respect to \(\mathbf{x}\)

  • \(\int_a^b f(x) \;dx\): Definite integral of \(f\) from \(a\) to \(b\) with respect to \(x\)

  • \(\int f(x) \;dx\): Indefinite integral of \(f\) with respect to \(x\)

Probability and Information Theory

  • \(P(\cdot)\): Probability distribution

  • \(z \sim P\): Random variable \(z\) has probability distribution \(P\)

  • \(P(X \mid Y)\): Conditional probability of \(X \mid Y\)

  • \(p(x)\): Probability density function

  • \({E}_{x} [f(x)]\): Expectation of \(f\) with respect to \(x\)

  • \(X \perp Y\): Random variables \(X\) and \(Y\) are independent

  • \(X \perp Y \mid Z\): Random variables \(X\) and \(Y\) are conditionally independent given random variable \(Z\)

  • \(\mathrm{Var}(X)\): Variance of random variable \(X\)

  • \(\sigma_X\): Standard deviation of random variable \(X\)

  • \(\mathrm{Cov}(X, Y)\): Covariance of random variables \(X\) and \(Y\)

  • \(\rho(X, Y)\): Correlation of random variables \(X\) and \(Y\)

  • \(H(X)\): Entropy of random variable \(X\)

  • \(D_{\mathrm{KL}}(P\|Q)\): KL-divergence of distributions \(P\) and \(Q\)


  • \(\mathcal{O}\): Big O notation