Skip to content

Instantly share code, notes, and snippets.

@jjsantanna
Created February 11, 2019 06:09
Show Gist options
  • Save jjsantanna/1a36f60864bde59dbd945022380706e2 to your computer and use it in GitHub Desktop.
Save jjsantanna/1a36f60864bde59dbd945022380706e2 to your computer and use it in GitHub Desktop.
Calculate the Top N statistics from a dataframe series and the remaining info is grouped as 'others'
def top_n_dataframe(n,dataframe_field):
top_n = n
field_name = dataframe_field.name
top = dataframe_field.value_counts()[:top_n].to_frame().reset_index()
new_row = pd.DataFrame(data = {
'hits' : [ dataframe_field.value_counts()[top_n:].sum()],
field_name : ['others'],
})
top.columns = [field_name, 'hits']
top.set_index([field_name]).reset_index()
top_result = pd.concat([top, new_row])
# percentage field
df = top_result.groupby(field_name).sum()
df=df.sort_values(by="hits", ascending=False)
df['percent'] = df.transform(lambda x: (x/np.sum(x)*100))
return (df)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment