22.8. Distributions¶ Open the notebook in SageMaker Studio Lab
Now that we have learned how to work with probability in both the discrete and the continuous setting, let’s get to know some of the common distributions encountered. Depending on the area of machine learning, we may need to be familiar with vastly more of these, or for some areas of deep learning potentially none at all. This is, however, a good basic list to be familiar with. Let’s first import some common libraries.
22.8.1. Bernoulli¶
This is the simplest random variable usually encountered. This random
variable encodes a coin flip which comes up
The cumulative distribution function is
The probability mass function is plotted below.
Now, let’s plot the cumulative distribution function (22.8.2).
If
, .
We can sample an array of arbitrary shape from a Bernoulli random variable as follows.
tensor([[0, 1, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 1, 1, 1, 1, 0],
[1, 0, 0, 0, 1, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 1, 1, 0, 1, 0, 0]])
array([[0, 0, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
[0, 1, 0, 0, 0, 0, 0, 1, 1, 0],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 1, 1, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 1, 0, 0, 0, 0, 1],
[1, 0, 1, 0, 0, 1, 1, 0, 0, 0]])
<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[0., 0., 0., 0., 1., 1., 0., 1., 0., 0.],
[0., 0., 1., 0., 0., 1., 0., 0., 0., 0.],
[1., 0., 0., 1., 0., 1., 0., 0., 1., 0.],
[1., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 1., 1., 1.],
[0., 1., 1., 0., 0., 0., 0., 0., 0., 1.],
[0., 1., 0., 1., 0., 1., 1., 0., 1., 0.]], dtype=float32)>
22.8.2. Discrete Uniform¶
The next commonly encountered random variable is a discrete uniform. For
our discussion here, we will assume that it is supported on the integers
The cumulative distribution function is
Let’s first plot the probability mass function.
Now, let’s plot the cumulative distribution function (22.8.4).
If
, .
We can sample an array of arbitrary shape from a discrete uniform random variable as follows.
tensor([[1, 4, 3, 2, 1, 1, 3, 1, 1, 4],
[4, 1, 1, 4, 4, 1, 4, 3, 2, 4],
[2, 4, 4, 1, 4, 2, 4, 3, 2, 1],
[1, 2, 3, 1, 1, 4, 2, 4, 1, 3],
[1, 2, 4, 1, 4, 3, 3, 2, 2, 1],
[1, 2, 2, 4, 1, 3, 2, 4, 2, 3],
[1, 2, 3, 4, 1, 3, 4, 1, 4, 3],
[3, 1, 1, 4, 4, 1, 3, 1, 1, 2],
[2, 2, 4, 3, 4, 2, 3, 4, 2, 4],
[1, 4, 3, 3, 2, 3, 3, 4, 1, 3]])
array([[3, 4, 2, 1, 2, 1, 4, 4, 1, 4],
[2, 3, 4, 2, 1, 4, 4, 2, 2, 4],
[3, 4, 3, 4, 4, 4, 2, 4, 2, 4],
[3, 4, 4, 4, 1, 3, 1, 2, 4, 1],
[2, 2, 4, 1, 2, 4, 4, 3, 1, 2],
[3, 4, 4, 3, 4, 1, 1, 1, 4, 2],
[2, 1, 2, 1, 2, 2, 4, 4, 2, 2],
[3, 4, 3, 3, 3, 3, 3, 4, 4, 1],
[2, 1, 4, 2, 4, 2, 1, 2, 3, 1],
[3, 4, 1, 2, 2, 4, 4, 4, 4, 3]])
<tf.Tensor: shape=(10, 10), dtype=int32, numpy=
array([[2, 4, 1, 2, 3, 2, 4, 4, 1, 4],
[1, 1, 2, 2, 1, 3, 4, 1, 1, 2],
[2, 1, 4, 3, 1, 4, 1, 1, 2, 2],
[2, 1, 3, 1, 4, 2, 2, 3, 3, 4],
[2, 3, 2, 1, 2, 4, 3, 3, 2, 2],
[3, 3, 3, 3, 1, 3, 4, 3, 4, 1],
[2, 2, 3, 3, 2, 1, 1, 2, 2, 4],
[2, 2, 1, 1, 3, 4, 3, 1, 4, 2],
[3, 4, 2, 1, 4, 4, 1, 4, 2, 2],
[2, 3, 1, 2, 4, 1, 2, 1, 2, 2]], dtype=int32)>
22.8.3. Continuous Uniform¶
Next, let’s discuss the continuous uniform distribution. The idea behind
this random variable is that if we increase the
The probability density function is
The cumulative distribution function is
Let’s first plot the probability density function (22.8.6).
Now, let’s plot the cumulative distribution function (22.8.7).
If
, .
We can sample an array of arbitrary shape from a uniform random variable
as follows. Note that it by default samples from a
tensor([[2.4857, 2.2461, 1.6809, 2.7434, 2.7072, 2.6190, 1.4883, 1.2517, 1.3454,
2.4754],
[1.0974, 1.5680, 1.8788, 2.8231, 2.1695, 2.6461, 1.4914, 1.4887, 1.3860,
1.9090],
[1.3746, 1.7773, 1.2412, 1.1950, 2.7281, 2.8356, 1.2266, 2.4724, 2.4641,
2.8991],
[2.4018, 2.6727, 1.0308, 1.1951, 1.9390, 1.6486, 2.8314, 1.1025, 1.3354,
1.0130],
[1.1281, 1.8000, 2.3788, 2.6580, 1.6750, 2.2081, 1.2705, 1.0757, 2.3311,
2.6557],
[2.9912, 1.2263, 1.8115, 1.5940, 1.9321, 1.6469, 2.2990, 2.1473, 1.8165,
1.2806],
[1.1672, 1.1536, 1.9649, 2.1655, 1.7170, 1.0284, 1.3305, 2.1904, 1.4036,
2.1958],
[2.5891, 2.5840, 2.2679, 2.0687, 2.9249, 1.6741, 1.2238, 2.4463, 2.2235,
2.7038],
[1.8697, 2.4965, 1.5785, 2.7890, 2.3319, 2.1434, 2.3333, 1.0286, 1.9245,
1.7640],
[1.2504, 1.7558, 1.4322, 1.5226, 1.3380, 1.1388, 1.8707, 2.2330, 2.3818,
2.2087]])
array([[2.38360201, 1.42301059, 1.30828215, 2.23648218, 2.36792603,
1.91291633, 2.86068987, 1.82011582, 2.04179583, 1.60297964],
[2.16824638, 1.57385641, 1.66921053, 1.43114352, 2.25602411,
2.87490344, 2.40876076, 1.7617666 , 2.02837681, 1.95209339],
[2.10921523, 2.19732773, 1.59625198, 1.61302107, 1.27852537,
2.37811459, 2.29000406, 1.03847199, 1.56422557, 2.50686118],
[1.7817774 , 1.62100143, 2.27307703, 2.05133929, 2.05104624,
2.96610051, 2.89734953, 1.21910903, 2.9754619 , 2.48726223],
[2.56736775, 1.839721 , 2.95232472, 1.12483235, 2.5400353 ,
2.29622885, 2.28849311, 2.52556794, 1.11539063, 1.49332251],
[1.87762881, 2.0559545 , 1.62359339, 1.90967816, 2.98212587,
1.21525452, 2.68658767, 2.54676585, 1.1852055 , 2.45969756],
[2.07266639, 2.95876653, 2.00955484, 1.55029107, 1.50520493,
1.88796762, 1.92171128, 2.02120858, 1.56685236, 2.6619405 ],
[1.11606361, 1.40236782, 1.0776729 , 1.41579594, 2.87791721,
1.28461063, 1.91013181, 1.59194299, 2.97532135, 2.85899927],
[2.06719995, 1.70292102, 2.4059567 , 1.61806169, 1.81718481,
2.92306811, 2.31158504, 1.05026323, 1.57910039, 1.83457301],
[1.85492878, 1.84662898, 1.41416257, 1.05939756, 1.23999994,
2.11843352, 2.93857488, 1.05851556, 1.69802914, 2.87658077]])
<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[2.8664184, 2.438953 , 2.183648 , 2.6834862, 2.5966637, 1.1439416,
2.608807 , 1.0293896, 2.3320234, 1.8846245],
[1.080436 , 1.3330972, 2.3824317, 1.6961837, 2.3930948, 2.5123057,
2.1429653, 1.4829049, 2.1934493, 2.2013004],
[1.0424838, 2.796021 , 2.7420611, 1.4439683, 1.0762017, 2.6044428,
2.288131 , 2.7516022, 1.878279 , 1.0029557],
[2.6579297, 2.4939828, 2.953441 , 2.1348112, 1.7846551, 2.8381727,
1.4484763, 2.6948266, 1.966721 , 2.1762617],
[1.6473658, 2.3157299, 1.5706291, 2.6134923, 2.5549824, 1.2292521,
1.2990353, 1.1018548, 2.3749387, 2.814359 ],
[1.3394063, 1.7971177, 1.6891305, 2.0523329, 1.7005038, 2.4614336,
2.6337047, 2.2743304, 2.2163136, 1.7015438],
[1.9442399, 2.6003797, 2.0429137, 2.23415 , 2.2446375, 1.4651737,
2.4320726, 1.5983024, 1.5828397, 1.197459 ],
[1.8227904, 2.2702656, 1.9956629, 1.6375196, 1.4135013, 1.7102294,
2.4104555, 2.0014505, 1.4420359, 2.340128 ],
[1.5781457, 2.5949705, 2.9382844, 1.0134435, 2.4329488, 1.2575395,
1.6634142, 1.7678592, 2.8386252, 1.0254025],
[1.6798866, 2.7402108, 1.1072655, 2.0986164, 1.3502924, 2.2395515,
2.4990425, 1.8304801, 2.674482 , 2.3498237]], dtype=float32)>
22.8.4. Binomial¶
Let’s make things a little more complex and examine the binomial
random variable. This random variable originates from performing a
sequence of
Let’s express this mathematically. Each experiment is an independent
random variable
In this case, we will write
To get the cumulative distribution function, we need to notice that
getting exactly
Let’s first plot the probability mass function.
n, p = 10, 0.2
# Compute binomial coefficient
def binom(n, k):
comb = 1
for i in range(min(k, n - k)):
comb = comb * (n - i) // (i + 1)
return comb
pmf = torch.tensor([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True)
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.show()
n, p = 10, 0.2
# Compute binomial coefficient
def binom(n, k):
comb = 1
for i in range(min(k, n - k)):
comb = comb * (n - i) // (i + 1)
return comb
pmf = np.array([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True)
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.show()
n, p = 10, 0.2
# Compute binomial coefficient
def binom(n, k):
comb = 1
for i in range(min(k, n - k)):
comb = comb * (n - i) // (i + 1)
return comb
pmf = tf.constant([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True)
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.show()
Now, let’s plot the cumulative distribution function (22.8.10).
If
, .
This follows from the linearity of expected value over the sum of
tensor([[6., 3., 4., 3., 3., 1., 3., 3., 3., 3.],
[3., 1., 2., 2., 3., 2., 1., 3., 1., 4.],
[6., 1., 0., 3., 0., 3., 1., 0., 1., 1.],
[1., 2., 3., 1., 2., 2., 2., 2., 3., 2.],
[2., 2., 5., 4., 1., 3., 4., 3., 2., 0.],
[2., 0., 2., 2., 3., 1., 1., 4., 3., 1.],
[1., 1., 3., 2., 4., 2., 2., 2., 1., 0.],
[0., 3., 2., 1., 1., 3., 2., 1., 1., 3.],
[2., 3., 2., 3., 4., 3., 1., 2., 1., 2.],
[1., 2., 1., 1., 3., 2., 4., 3., 3., 2.]])
array([[2, 1, 1, 2, 0, 3, 3, 1, 3, 4],
[0, 2, 0, 2, 2, 1, 2, 1, 1, 2],
[2, 2, 1, 1, 1, 2, 2, 3, 2, 3],
[3, 2, 3, 2, 3, 2, 1, 1, 4, 1],
[2, 2, 1, 2, 0, 2, 2, 1, 1, 2],
[1, 1, 1, 0, 2, 0, 3, 3, 1, 0],
[3, 3, 0, 3, 2, 2, 0, 1, 4, 4],
[0, 1, 0, 1, 2, 5, 1, 3, 1, 0],
[0, 3, 2, 4, 2, 1, 3, 3, 3, 3],
[4, 3, 3, 2, 3, 2, 1, 3, 0, 1]])
WARNING:tensorflow:From /home/ci/.local/lib/python3.10/site-packages/tensorflow_probability/python/internal/batched_rejection_sampler.py:102: calling while_loop_v2 (from tensorflow.python.ops.control_flow_ops) with back_prop=False is deprecated and will be removed in a future version.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.while_loop(c, b, vars, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.while_loop(c, b, vars))
<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[6., 9., 9., 9., 8., 4., 6., 7., 5., 7.],
[7., 7., 6., 7., 4., 4., 6., 6., 7., 7.],
[6., 3., 5., 9., 7., 7., 6., 6., 5., 7.],
[4., 6., 5., 2., 5., 6., 8., 8., 4., 7.],
[5., 3., 6., 6., 8., 5., 4., 5., 7., 7.],
[5., 3., 6., 3., 4., 5., 4., 1., 7., 5.],
[6., 6., 6., 5., 4., 6., 7., 8., 5., 3.],
[5., 8., 6., 9., 4., 5., 3., 7., 5., 7.],
[5., 2., 5., 6., 6., 8., 8., 4., 4., 6.],
[6., 7., 3., 5., 4., 5., 3., 4., 6., 7.]], dtype=float32)>
22.8.5. Poisson¶
Let’s now perform a thought experiment. We are standing at a bus stop
and we want to know how many buses will arrive in the next minute. Let’s
start by considering
However, if we are in a busy area, it is possible or even likely that two buses will arrive. We can model this by splitting our random variable into two parts for the first 30 seconds, or the second 30 seconds. In this case we can write
where
Why stop here? Let’s continue to split that minute into
Consider these random variables. By the previous section, we know that
(22.8.12) has mean
This should not come as too much of a surprise, since in the real world we can just count the number of bus arrivals, however it is nice to see that our mathematical model is well defined. This discussion can be made formal as the law of rare events.
Following through this reasoning carefully, we can arrive at the
following model. We will say that
The value
We may sum this probability mass function to get the cumulative distribution function.
Let’s first plot the probability mass function (22.8.13).
Now, let’s plot the cumulative distribution function (22.8.14).
As we saw above, the means and variances are particularly concise. If
, .
This can be sampled as follows.
tensor([[ 1., 4., 6., 8., 4., 4., 4., 7., 6., 4.],
[ 3., 6., 7., 7., 5., 7., 7., 3., 5., 4.],
[ 4., 1., 3., 3., 10., 5., 5., 3., 7., 5.],
[ 4., 3., 4., 10., 8., 6., 4., 6., 5., 5.],
[ 5., 11., 1., 5., 7., 5., 2., 4., 3., 5.],
[ 6., 6., 4., 4., 3., 1., 5., 8., 4., 5.],
[ 2., 9., 7., 2., 6., 5., 2., 8., 6., 10.],
[ 1., 4., 3., 7., 3., 1., 7., 5., 3., 6.],
[ 5., 4., 6., 4., 9., 8., 3., 3., 1., 8.],
[ 3., 12., 9., 13., 2., 14., 3., 2., 0., 3.]])
array([[ 5, 5, 4, 2, 13, 7, 8, 6, 6, 5],
[ 6, 3, 4, 5, 4, 2, 1, 3, 6, 3],
[ 6, 5, 3, 4, 4, 4, 2, 3, 2, 5],
[ 2, 8, 4, 7, 7, 7, 5, 6, 2, 6],
[ 3, 4, 3, 0, 7, 2, 6, 6, 7, 4],
[ 4, 1, 5, 0, 3, 3, 3, 6, 4, 3],
[ 4, 5, 4, 6, 4, 5, 3, 6, 9, 7],
[ 4, 2, 5, 3, 5, 5, 2, 6, 10, 5],
[ 5, 4, 5, 3, 6, 5, 2, 3, 6, 3],
[ 8, 7, 9, 6, 3, 7, 11, 7, 13, 2]])
<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[ 6., 5., 3., 5., 4., 6., 5., 8., 4., 4.],
[ 6., 4., 3., 7., 9., 4., 4., 4., 7., 6.],
[ 9., 4., 5., 3., 6., 4., 1., 5., 4., 7.],
[ 7., 3., 4., 7., 5., 2., 3., 1., 6., 4.],
[ 4., 6., 4., 10., 5., 4., 4., 6., 2., 0.],
[ 3., 4., 5., 2., 5., 2., 3., 4., 3., 7.],
[ 2., 7., 6., 5., 3., 7., 6., 7., 4., 3.],
[ 4., 6., 4., 7., 4., 5., 2., 4., 5., 4.],
[ 6., 6., 3., 6., 4., 7., 6., 5., 3., 5.],
[ 5., 7., 2., 5., 2., 9., 9., 2., 3., 4.]], dtype=float32)>
22.8.6. Gaussian¶
Now Let’s try a different, but related experiment. Let’s say we again
are performing
However, not all hope is lost! Let’s just make the mean and variance be well behaved by defining
This can be seen to have mean zero and variance one, and so it is plausible to believe that it will converge to some limiting distribution. If we plot what these distributions look like, we will become even more convinced that it will work.
p = 0.2
ns = [1, 10, 100, 1000]
d2l.plt.figure(figsize=(10, 3))
for i in range(4):
n = ns[i]
pmf = torch.tensor([p**i * (1-p)**(n-i) * binom(n, i)
for i in range(n + 1)])
d2l.plt.subplot(1, 4, i + 1)
d2l.plt.stem([(i - n*p)/torch.sqrt(torch.tensor(n*p*(1 - p)))
for i in range(n + 1)], pmf,
use_line_collection=True)
d2l.plt.xlim([-4, 4])
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.title("n = {}".format(n))
d2l.plt.show()
p = 0.2
ns = [1, 10, 100, 1000]
d2l.plt.figure(figsize=(10, 3))
for i in range(4):
n = ns[i]
pmf = np.array([p**i * (1-p)**(n-i) * binom(n, i) for i in range(n + 1)])
d2l.plt.subplot(1, 4, i + 1)
d2l.plt.stem([(i - n*p)/np.sqrt(n*p*(1 - p)) for i in range(n + 1)], pmf,
use_line_collection=True)
d2l.plt.xlim([-4, 4])
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.title("n = {}".format(n))
d2l.plt.show()
p = 0.2
ns = [1, 10, 100, 1000]
d2l.plt.figure(figsize=(10, 3))
for i in range(4):
n = ns[i]
pmf = tf.constant([p**i * (1-p)**(n-i) * binom(n, i)
for i in range(n + 1)])
d2l.plt.subplot(1, 4, i + 1)
d2l.plt.stem([(i - n*p)/tf.sqrt(tf.constant(n*p*(1 - p)))
for i in range(n + 1)], pmf,
use_line_collection=True)
d2l.plt.xlim([-4, 4])
d2l.plt.xlabel('x')
d2l.plt.ylabel('p.m.f.')
d2l.plt.title("n = {}".format(n))
d2l.plt.show()
One thing to note: compared to the Poisson case, we are now dividing by the standard deviation which means that we are squeezing the possible outcomes into smaller and smaller areas. This is an indication that our limit will no longer be discrete, but rather continuous.
A derivation of what occurs is beyond the scope of this document, but
the central limit theorem states that as
where we say a random variable is normally distributed with given mean
Let’s first plot the probability density function (22.8.17).
Now, let’s plot the cumulative distribution function. It is beyond the
scope of this appendix, but the Gaussian c.d.f. does not have a
closed-form formula in terms of more elementary functions. We will use
erf
which provides a way to compute this integral numerically.
Keen-eyed readers will recognize some of these terms. Indeed, we
encountered this integral in Section 22.5. Indeed
we need exactly that computation to see that this
Our choice of working with coin flips made computations shorter, but
nothing about that choice was fundamental. Indeed, if we take any
collection of independent identically distributed random variables
Then
will be approximately Gaussian. There are additional requirements needed
to make it work, most commonly
The central limit theorem is the reason why the Gaussian is fundamental to probability, statistics, and machine learning. Whenever we can say that something we measured is a sum of many small independent contributions, we can assume that the thing being measured will be close to Gaussian.
There are many more fascinating properties of Gaussians, and we would like to discuss one more here. The Gaussian is what is known as a maximum entropy distribution. We will get into entropy more deeply in Section 22.11, however all we need to know at this point is that it is a measure of randomness. In a rigorous mathematical sense, we can think of the Gaussian as the most random choice of random variable with fixed mean and variance. Thus, if we know that our random variable has some mean and variance, the Gaussian is in a sense the most conservative choice of distribution we can make.
To close the section, let’s recall that if
, .
We can sample from the Gaussian (or standard normal) distribution as shown below.
tensor([[ 1.3588, 0.0473, -1.5805, -0.0108, 0.4253, 0.7924, -0.6547, 0.7313,
-0.3038, 1.1935],
[ 0.0089, 0.8951, 1.0055, 0.0956, -1.1109, -0.6342, 1.6772, 1.0314,
0.3819, -1.7822],
[-0.0604, -1.0318, 0.9113, 1.3118, -1.8370, -0.9023, 1.0365, 0.9052,
-0.6411, -0.8949],
[-0.1713, -0.2347, 0.0767, -0.6375, -0.4612, -1.6875, -0.1570, 1.0591,
0.8377, 0.5097],
[ 0.2762, -0.6213, -0.3422, 0.9449, -0.7544, -0.2150, 1.0240, 1.0253,
-0.9182, 1.1536],
[ 0.0614, 0.2758, -0.3610, -1.0577, -0.5513, -0.9158, 0.7539, 0.9204,
-0.5908, 0.9113],
[ 1.6190, -0.9213, -0.7944, -2.2621, 0.5826, -1.8287, 1.4097, -0.5744,
-0.0668, 1.2074],
[-0.0624, 0.1928, 1.3002, 0.6756, 1.1590, 1.0144, 1.1840, -0.5010,
0.6026, -0.7722],
[-2.0148, 0.6958, 0.9940, 0.8477, 1.0957, -0.5253, 0.2353, -0.2663,
1.2275, 0.5993],
[ 0.4651, -0.8218, -0.5441, -2.0338, -0.6930, -0.0674, -0.4448, -0.8397,
0.0360, -0.7089]])
array([[-0.11992579, 0.11242172, -0.35572603, 0.58136987, 0.12435943,
0.75733951, -0.13772477, -0.10270837, -1.59153191, -0.94093858],
[ 1.01421669, -0.64482199, -1.19968905, -0.29650658, 0.21354805,
-0.233707 , -0.84922388, 0.38375312, -0.3886712 , -0.28680926],
[ 0.26912722, 0.3832668 , -1.56047648, 1.55956818, -0.84004616,
-0.35190349, -0.54684824, 0.83748666, -0.95408109, 0.61570842],
[ 1.42284436, 1.47742409, 1.24482391, -0.85638551, -0.78176885,
0.78364858, 0.3804224 , 0.68402399, -1.51515355, 0.77536699],
[ 0.80657544, -2.01318421, 0.0262837 , 0.14704248, -1.05968065,
0.09993582, 0.3437732 , 0.71795499, 2.40652949, -0.24287448],
[ 0.60314452, 0.96139177, 0.42617912, -1.50385243, 1.89889768,
-0.18784024, -0.29100909, -0.61710869, 1.00194018, 0.81604849],
[ 0.27520902, -1.01320489, -1.32230684, 0.91961478, 1.08834228,
1.52541641, 0.83242223, -0.70249323, -1.41539373, 0.35746912],
[-0.37485341, -0.81440897, 0.64964391, -2.64441164, 0.51285708,
-0.00280402, -0.36267136, -0.89061862, -0.2587532 , 1.36505027],
[ 0.30396154, -1.17431444, 0.3697711 , -0.58526674, -1.00467336,
1.80141639, 0.44061838, 0.66772324, 0.00462039, -1.1309502 ],
[-0.28877008, 0.89796664, -0.80642533, -1.38372865, -0.72438918,
0.34978787, 0.9175374 , -0.43026127, -0.409859 , -1.43388418]])
<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[-0.47082466, 0.8840835 , 0.13613819, -0.03887384, -1.3103229 ,
-0.87125725, 0.1467262 , -1.205572 , -0.9371071 , 0.5023256 ],
[-1.180266 , -1.1548375 , 1.3918712 , 0.43585238, 0.01733993,
-1.790916 , 0.12158127, -0.18356071, -0.28034893, -0.68480223],
[-2.3942738 , -1.0831766 , 0.2986123 , -0.11818152, 1.964042 ,
0.32228935, -0.20232098, 1.050008 , 0.68574095, -0.42878217],
[-0.2769131 , -2.0021179 , 1.4159348 , 0.22262587, 0.43598378,
-0.46475738, -0.6122648 , -1.0528542 , -0.99552286, -1.0606335 ],
[ 2.1345575 , -1.1459693 , 0.17686844, -0.9734485 , -0.94634855,
1.3928679 , -0.5110315 , 0.4557909 , 1.3669354 , -0.2503584 ],
[-0.96597624, -1.3229077 , -0.09891371, 0.6545881 , -0.13871759,
-0.32090858, 0.82951075, -1.2182976 , 0.4526086 , -0.41823685],
[-0.46264172, -1.0363445 , 0.7605979 , -1.1535795 , -0.97582847,
1.0007198 , 0.6450034 , -0.6664228 , -0.63123536, -0.07606531],
[ 0.6581902 , -0.18795264, -1.2491583 , -1.1792243 , -1.6373378 ,
-1.1988202 , 1.2502977 , 0.7889295 , -0.17174181, -0.37365198],
[ 1.4740059 , 1.1723006 , -0.25428358, -0.7858001 , 0.9736877 ,
0.716497 , -0.82188153, -0.11518795, -0.8567569 , -0.730805 ],
[-1.2619729 , 1.128404 , 0.4920151 , -0.3575905 , -0.4083109 ,
-0.06316691, 0.7730259 , -0.8047515 , 0.72060764, -0.3748437 ]],
dtype=float32)>
22.8.7. Exponential Family¶
One shared property for all the distributions listed above is that they all belong to which is known as the exponential family. The exponential family is a set of distributions whose density can be expressed in the following form:
As this definition can be a little subtle, let’s examine it closely.
First,
Second, we have the vector
Third, we have
To be concrete, let’s consider the Gaussian. Assuming that
This matches the definition of the exponential family with:
underlying measure:
,natural parameters:
,sufficient statistics:
, andcumulant function:
.
It is worth noting that the exact choice of each of above terms is somewhat arbitrary. Indeed, the important feature is that the distribution can be expressed in this form, not the exact form itself.
As we allude to in Section 4.1.2.2, a widely
used technique is to assume that the final output
22.8.8. Summary¶
Bernoulli random variables can be used to model events with a yes/no outcome.
Discrete uniform distributions model selects from a finite set of possibilities.
Continuous uniform distributions select from an interval.
Binomial distributions model a series of Bernoulli random variables, and count the number of successes.
Poisson random variables model the arrival of rare events.
Gaussian random variables model the result of adding a large number of independent random variables together.
All the above distributions belong to exponential family.
22.8.9. Exercises¶
What is the standard deviation of a random variable that is the difference
of two independent binomial random variables .If we take a Poisson random variable
and consider as , we can show that this becomes approximately Gaussian. Why does this make sense?What is the probability mass function for a sum of two discrete uniform random variables on
elements?