Skip to content

Instantly share code, notes, and snippets.

@Perif
Last active August 14, 2017 15:11
Show Gist options
  • Save Perif/b693b86b71959fc54ce1e60811c45191 to your computer and use it in GitHub Desktop.
Save Perif/b693b86b71959fc54ce1e60811c45191 to your computer and use it in GitHub Desktop.
Bins Analysis with Numpy
import pickle as pkl
import numpy as np
import seaborn as sns
# load data
scores = pkl.load(open('y_scores_20.pkl','rb'))
# create bins
bins = np.linspace(0, 1, 157) # 157 is determined arbitrarily
digitized = np.digitize(scores, bins)
bin_means = [scores[digitized == i].mean() for i in range(1, len(bins))]
# get the histogram, https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html
hist = np.histogram(scores,bins='sturges')
np.savetxt('bins.csv',scores[scores > hist[1][1]],delimiter=';') # export without the first bin
# if visualized in tableau, bins are computed as follows: Number of Bins = 3 + log2(n) * log(n)
# ref: http://onlinehelp.tableau.com/current/pro/desktop/en-us/calculations_bins.html
# plot with seaborn
dist = sns.distplot(scores[scores > hist[1][1]]) # visualize without the first bin
dist.figure.savefig('output.png')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment