Last active
July 26, 2020 15:28
-
-
Save DanielTakeshi/7f90c6a508678e04714933378f13c483 to your computer and use it in GitHub Desktop.
How to sample from a log-uniform distribution.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
How we might sample from a log-uniform distribution | |
https://stats.stackexchange.com/questions/155552/what-does-log-uniformly-distribution-mean | |
Only run one of these three cases at a time, otherwise the plots update each | |
other. Run with these versions: | |
matplotlib 3.2.1 | |
numpy 1.18.3 | |
""" | |
import numpy as np | |
import matplotlib.pyplot as plt | |
np.set_printoptions(suppress=True, linewidth=200, edgeitems=10) | |
low = 0.01 | |
high = 0.7 | |
size = 10000 | |
nb_bins = 50 | |
if False: | |
data = np.random.uniform(low=low, high=high, size=size) | |
count, bins, ignored = plt.hist(data, bins=nb_bins, align='mid') | |
plt.title('Uniform({}, {})'.format(low, high)) | |
plt.xlabel('Epsilon') | |
plt.savefig('distr_uniform.png') | |
if False: | |
data = np.random.uniform(low=np.log(low), high=np.log(high), size=size) | |
count, bins, ignored = plt.hist(data, bins=nb_bins, align='mid') | |
plt.title('Uniform(log({}), log({})'.format(low, high)) | |
plt.xlabel('Epsilon') | |
plt.savefig('distr_uniform_log.png') | |
# Number of classes are the number of intervals. | |
nb_classes = 5 + 1 | |
if True: | |
data = np.random.uniform(low=np.log(low), high=np.log(high), size=size) | |
discretized = np.linspace(np.log(low), np.log(high), num=nb_classes) | |
data = np.exp(data) | |
count, bins, ignored = plt.hist(data, bins=nb_bins, align='mid') | |
plt.title('exp( Uniform(log({}), log({}) )'.format(low, high)) | |
plt.xlabel('Epsilon') | |
plt.savefig('distr_uniform_log_true.png') | |
# Now let's add dicretized ranges. | |
print('Discretized bounds (len {}) for epsilons:\nLog: {}\nNormal: {}'.format( | |
len(discretized), discretized, np.exp(discretized))) | |
for idx,item in enumerate(discretized): | |
plt.axvline(x=np.exp(item), color='black') | |
if idx < len(discretized) - 1: | |
start = np.exp(discretized[idx]) | |
end = np.exp(discretized[idx+1]) | |
count = np.sum( (start <= data) & (data < end) ) | |
print('{:.3f} <= x < {:.3f} count: {}'.format(start, end, count)) | |
plt.savefig('distr_uniform_log_true_bounds.png') |
Note that this is similar but not quite the same as calling something like torch.logspace(low, high)
. What the torch logspace does is get a set of values within that range which are "spaced logarithmically." E.g.:
For 5 and 10 numbered spacings, between 0.1 and 0.01, we call torch.logspace( torch.log10(0.01), torch.log10(0.1) )
:
tensor([0.1000, 0.0562, 0.0316, 0.0178, 0.0100])
tensor([0.1000, 0.0774, 0.0599, 0.0464, 0.0359, 0.0278, 0.0215, 0.0167, 0.0129, 0.0100])
(July 26) Now log(0.01) to log(0.7) with discretized bins.
Here is a plot which also has nb_classes=5+1 (because nb_classes is really the number of vertical ticks).
For classes, I get:
0.010 <= x < 0.023 count: 1998
0.023 <= x < 0.055 count: 1968
0.055 <= x < 0.128 count: 2020
0.128 <= x < 0.299 count: 2021
0.299 <= x < 0.700 count: 1993
If it's nb_classes=10+1)
then we get:
0.010 <= x < 0.015 count: 991
0.015 <= x < 0.023 count: 1034
0.023 <= x < 0.036 count: 1043
0.036 <= x < 0.055 count: 968
0.055 <= x < 0.084 count: 983
0.084 <= x < 0.128 count: 1020
0.128 <= x < 0.196 count: 1051
0.196 <= x < 0.299 count: 1022
0.299 <= x < 0.458 count: 961
0.458 <= x < 0.700 count: 927
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
0.01 to 0.2
The standard uniform distribution.
A visualization of what the log looks like:
And the exponentiated version which is what we will actually be using.