Created
January 9, 2020 18:16
-
-
Save salotz/0158a99a75078b47538452111ec0faa2 to your computer and use it in GitHub Desktop.
Clean numpy-only implementation of the Shimazaki-Shinimoto histogram binning method as a function.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def shimazaki_shinimoto_binning(x, min_bins, max_bins): | |
"""The Shimazaki-Shinimoto histogram binning algorithm for choosing an | |
optimal constant number of bins. | |
Parameters | |
---------- | |
x : array of int or float | |
The data you want to bin | |
min_bins : int | |
The minimum number of bins to consider for optimization. Must | |
be at least 1. | |
max_bins : int | |
The maximum number of bins to consider for optimization. | |
Returns | |
------- | |
optimal_bin_size : float | |
The optimal bin size calculated | |
bin_edges : array of float | |
The bin edges for the data using the optimal bin size | |
Notes | |
----- | |
Created on Thu Oct 25 11:32:47 2012 | |
Histogram Binwidth Optimization Method | |
Shimazaki and Shinomoto, Neural Comput 19 1503-1527, 2007 | |
2006 Author Hideaki Shimazaki, Matlab | |
Department of Physics, Kyoto University | |
shimazaki at ton.scphys.kyoto-u.ac.jp | |
Please feel free to use/modify this program. | |
Version in python adapted Érbet Almeida Costa | |
Bugfix by Takuma Torii 2.24.2013 | |
Adapted as a library function and commented etc. by Samuel D. Lotz | |
2020-1-9 from: | |
http://176.32.89.45/~hideaki/res/histogram.html#Python1D | |
archived at: | |
https://web.archive.org/save/http://176.32.89.45/~hideaki/res/histogram.html#Python1D | |
References | |
---------- | |
@article{Shimazaki_2007, | |
doi = {10.1162/neco.2007.19.6.1503}, | |
url = {https://doi.org/10.1162%2Fneco.2007.19.6.1503}, | |
year = 2007, | |
month = {jun}, | |
publisher = {{MIT} Press - Journals}, | |
volume = {19}, | |
number = {6}, | |
pages = {1503--1527}, | |
author = {Hideaki Shimazaki and Shigeru Shinomoto}, | |
title = {A Method for Selecting the Bin Size of a Time Histogram}, | |
journal = {Neural Computation} | |
} | |
""" | |
import numpy as np | |
assert min_bins > 1, "must be more than 1 bin" | |
x_max = np.max(x) | |
x_min = np.min(x) | |
# The minimum and maximum number of bins will determine the number | |
# of trials in between them. This will go into the construction of | |
# the cost function to determine the optimal bin size | |
trials_num_bins = np.arange(min_bins, max_bins) | |
num_trials = len(trials_num_bins) | |
# compute the bin size for each trial based on the min and max of | |
# the data | |
trials_bin_sizes = (x_max - x_min) / trials_num_bins | |
# then for each trial we will have a cost value associated with it | |
cost_vector = [] | |
# Computation of the cost values for each bin size trial | |
for trial_idx, num_bins in enumerate(trials_num_bins): | |
num_edges = num_bins[trial_idx] + 1 | |
# bin edges are linearly spaced for this range | |
edges = np.linspace(x_min, x_max, num_edges) | |
# count number of events in bins | |
trial_hist, _ = np.histogram(x, bins=edges) | |
# mean of event count | |
mean_count = np.mean(trial_hist) | |
# variance of event count | |
var_count = sum((trial_hist - mean_count)**2) / num_bins[trial_idx] | |
# compute the cost for this trial | |
cost = (2 * mean_count - var_count) /\ | |
(trials_bin_sizes[trial_idx] ** 2) | |
cost_vector.append(cost) | |
# Optimal Bin Size Selection | |
# just find the minimal cost | |
min_idx = np.argmin(cost_vector) | |
min_cost = cost_vector[min_idx] | |
# then choose the number of bins associated with that | |
optimal_bin_size = trials_bin_sizes[min_idx] | |
edges = np.arange(x_min, x_max, num_bins[min_idx]+1) | |
return optimal_bin_size, edges |
Good to know! I didn't actually write the business code. I just jammed it into a function from some code the authors wrote. I never ended up using it. If you make a fixed version post a link.
If anyone have fixed version please post it
Hello! I stumbled over here while looking for this method... The code above seemed to have some inconsistencies as @AstroPierre pointed out. @salotz and @saravanansaminathan, find below a cleaner Python 3 reproduction from the archived version that may work for you. I hope this helps.
import numpy as np
import matplotlib.pyplot as plt
# Generate n pseudo-random numbers with(mu,sigma,n)
x = np.random.normal(0, 100, 100)
x_min, x_max = np.min(x), np.max(x)
N_MIN = 4 # Min number of bins (integer), must be > 1
N_MAX = 50 # Max number of bins (integer)
N = np.arange(N_MIN,N_MAX) # number of bins
D = (x_max-x_min)/N # Bin size vector
C = np.zeros(np.size(D))
# Computation of the cost function
for i in range(np.size(N)):
edges = np.linspace(x_min,x_max,N[i]+1) # Bin edges
ki = plt.hist(x,edges,alpha=0.5)[0] # Count number of events in bins
k = np.mean(ki) # Mean of event count
v = np.sum((ki-k)**2)/N[i] # Variance of event count
C[i] = (2*k-v)/((D[i])**2) # Cost Function
# Optimal bin size Selection
cmin = np.min(C)
idx = np.where(C == cmin)[0][0]
optD = D[idx]
# Plotting
edges = np.linspace(x_min,x_max,N[idx]+1)
plt.figure(figsize=(12,4))
plt.subplot(121)
plt.hist(x,edges)
plt.title("Histogram")
plt.ylabel("Frequency")
plt.xlabel("Value")
#plt.savefig('Hist.png')
plt.subplot(122)
plt.plot(D,C,'.',alpha=0.5)
plt.plot(optD,cmin,'*r');
#plt.savefig('Fobj.png')
Thank you Gustavo!
Le jeu. 10 mars 2022 à 00:56, Gustavo Oliveira ***@***.***> a
écrit :
… ***@***.**** commented on this gist.
------------------------------
Hello! I stumbled over here while looking for this method... The code
above seemed to have some inconsistencies as @AstroPierre
<https://github.com/AstroPierre> pointed out. @salotz
<https://github.com/salotz> and @saravanansaminathan
<https://github.com/saravanansaminathan>, find below a cleaner Python 3
reproduction from the archived version that may work for you. I hope this
helps.
import numpy as npimport matplotlib.pyplot as plt
# Generate n pseudo-random numbers with(mu,sigma,n)x = np.random.normal(0, 100, 100)
x_min, x_max = np.min(x), np.max(x)
N_MIN = 4 # Min number of bins (integer), must be > 1N_MAX = 50 # Max number of bins (integer)
N = np.arange(N_MIN,N_MAX) # number of binsD = (x_max-x_min)/N # Bin size vectorC = np.zeros(np.size(D))
# Computation of the cost functionfor i in range(np.size(N)):
edges = np.linspace(x_min,x_max,N[i]+1) # Bin edges
ki = plt.hist(x,edges,alpha=0.5)[0] # Count number of events in bins
k = np.mean(ki) # Mean of event count
v = np.sum((ki-k)**2)/N[i] # Variance of event count
C[i] = (2*k-v)/((D[i])**2) # Cost Function
# Optimal bin size Selectioncmin = np.min(C)idx = np.where(C == cmin)[0][0]optD = D[idx]
# Plottingedges = np.linspace(x_min,x_max,N[idx]+1)plt.figure(figsize=(12,4))plt.subplot(121)plt.hist(x,edges)plt.title("Histogram")plt.ylabel("Frequency")plt.xlabel("Value")#plt.savefig('Hist.png')
plt.subplot(122)plt.plot(D,C,'.',alpha=0.5)plt.plot(optD,cmin,'*r');#plt.savefig('Fobj.png')
—
Reply to this email directly, view it on GitHub
<https://gist.github.com/0158a99a75078b47538452111ec0faa2#gistcomment-4092015>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APM53UKI2OSB6FX4JJC5LUTU7GFJ3ANCNFSM4RPKWK7A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are a few mistakes related to the num_bins variable which is used as a vector instead of a scalar. Hence, there is some confusion between num_bins and trials_num_bins. Easy to fix.