-
-
Save tupui/3b79ecea1631e8925f3d47069d435b0f to your computer and use it in GitHub Desktop.
"""Orthogonal Latin Hypercube Sampling. | |
--------------------------- | |
MIT License | |
Copyright (c) 2021 Pamphile Tupui ROY | |
Permission is hereby granted, free of charge, to any person obtaining a copy | |
of this software and associated documentation files (the "Software"), to deal | |
in the Software without restriction, including without limitation the rights | |
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
copies of the Software, and to permit persons to whom the Software is | |
furnished to do so, subject to the following conditions: | |
The above copyright notice and this permission notice shall be included in all | |
copies or substantial portions of the Software. | |
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
SOFTWARE. | |
""" | |
import functools | |
import numpy as np | |
from scipy import spatial | |
from scipy.stats import qmc | |
from scipy import optimize | |
import matplotlib.pyplot as plt | |
import seaborn as sns | |
def oa_lhs(p, d, seed): | |
oa_sample = np.zeros(shape=(p**2, p+1)) | |
arrays = np.tile(np.arange(p), (2, 1)) | |
oa_sample[:, :2] = np.stack(np.meshgrid(*arrays), axis=-1).reshape(-1, 2) | |
for p_ in range(1, p): | |
oa_sample[:, 2+p_-1] = np.mod(oa_sample[:, 0] + p_*oa_sample[:, 1], p) | |
# scramble the OA | |
oa_sample_ = np.empty(shape=(p**2, p+1)) | |
for j in range(p+1): | |
perms = np.random.permutation(p) | |
for k in range(p): | |
idx = np.where(oa_sample[:, j] == k)[0] | |
oa_sample_[idx, j] = perms[k] | |
oa_sample = oa_sample_ | |
# sample is a randomized OA from now | |
# and the following is making it an OA-LHS | |
oa_lhs_sample = np.zeros(shape=(p**2, p+1)) | |
for j in range(p+1): | |
for k in range(p): | |
idx = np.where(oa_sample[:, j] == k)[0] | |
lhs = LatinHypercube(d=1, centered=True, seed=seed).random(p).flatten() | |
oa_lhs_sample[:, j][idx] = lhs + oa_sample[:, j][idx] | |
oa_lhs_sample /= p | |
if d is not None: | |
oa_lhs_sample = oa_lhs_sample[:, :d] | |
return oa_lhs_sample | |
rng = np.random.default_rng() | |
p, d = 7, 3 | |
oa_lhs_sample = oa_lhs(p, d, seed=rng) | |
sns.pairplot(pd.DataFrame(oa_lhs_sample), diag_kind="hist", corner=True, diag_kws={"bins": p}) |
I was not sure about this because it would mean that the permutations are not independent. So we change all the 0 for a 1 in all columns. Whereas the current version changes only one column at a time. If we go with your version, I don't see how this is doing any scrambling. In the end, we just renamed the symbols if we do this.
Ah, you are right! I now have an intuition that permutations should be independently sampled for each column. Then the code near line 71 does not need any fix.
I could not run your snippet.
If runs once you define oa_lhs_sample
and p
.
import numpy as np
# define OA
p = 2
oa_lhs_sample = np.array([[0, 0, 1, 1], [0, 1, 0, 1], [0, 1, 1, 0]]).T / 2 + 0.25
for i in range(p):
for j in range(i+1, p):
samples_2d = oa_lhs_sample[:, [i, j]]
cart_prod = ((samples_2d * p) // 1).astype(int) # should be an cartesian product of [0, 1, ..., p-1] x [0, 1, ..., p-1]
count_cells = np.zeros([p, p])
count_cells[cart_prod] += 1 # all pairs of 0 to p-1 must be counted once
assert np.all(count_cells == 1)
Thanks for your inputs @kstoneriv3. I opened a PR if you did not see scipy/scipy#14546. Your test to check that we have an OA would be a nice addition. I will see how I can make this work. Or you're free to propose something over there.
@kstoneriv3 thanks for the feedbacks!
I was not sure about this because it would mean that the permutations are not independent. So we change all the 0 for a 1 in all columns. Whereas the current version changes only one column at a time. If we go with your version, I don't see how this is doing any scrambling. In the end we just renamed the symbols if we do this.
Yes.
Totally, it's a mistake of mine. Thanks.
I could not run your snippet.