Skip to content

Instantly share code, notes, and snippets.

@olgabot
Created July 2, 2014 23:00
Show Gist options
  • Save olgabot/e055b73ceecb19e0b4a6 to your computer and use it in GitHub Desktop.
Save olgabot/e055b73ceecb19e0b4a6 to your computer and use it in GitHub Desktop.
Bin a pandas dataframe on its columns
def binify(df, bins):
"""Makes a histogram of each row the provided binsize
Parameters
----------
data : pandas.DataFrame
A samples x features dataframe. Each feature will be binned into the
provided bins
bins : iterable
Bins you would like to use for this data. Must include the final bin
value, e.g. (0, 0.5, 1) for the two bins (0, 0.5) and (0.5, 1)
Returns
-------
binned : pandas.DataFrame
A len(bins)-1 x features DataFrame of each feature binned across
samples
"""
nrow = len(bins) - 1
ncol = df.shape[1]
binned = np.zeros((nrow, ncol))
# TODO.md: make sure this works for numpy matrices
for i, (name, row) in enumerate(df.iteritems()):
binned[:, i] = np.histogram(row, bins=bins, normed=True)[0]
index = ['{}-{}'.format(i, j) for i, j in zip(bins, bins[1:])]
binned = pd.DataFrame(binned, columns=df.columns, index=index)
return binned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment