Skip to content

Instantly share code, notes, and snippets.

@raybuhr
Last active October 25, 2019 14:23
Show Gist options
  • Save raybuhr/5dbbd0e2e29bdfedfb8e7f4f66170bce to your computer and use it in GitHub Desktop.
Save raybuhr/5dbbd0e2e29bdfedfb8e7f4f66170bce to your computer and use it in GitHub Desktop.
minimally sufficient pandas

the tl;dr of https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428

select a column of data, use brackets df['column_name']

select rows of data, use .loc or datetime index df['2019-01-01':'2019-02-28' ]

  • if performance is primary concern, using numpy array instead of Pandas

use read_csv and it's many arguments for reading files

use .isna method to filter NaN rows

use ~ to negate

prefer math operators (+ - * / ** // %) instead of math methods (lt gt eq ne)

use pandas math aggregation methods instead of built in math functions

  • df['column_name'].sum() instead of sum(df['column_name'])
  • df['column_name'].max() instead of max(df['column_name'])

prefer df.groupby(...).agg(...) for doing group by aggregation

  • Good: df.groupby('grouping column').agg({'aggregating column': 'aggregating function'})
    • e.g. df.groupby('fruit').agg({'tastiness': 'mean'})
    • e.g. df.groupby('fruit').agg({'tastiness': 'mean', 'weight': ['mean', 'median']})
  • OK: df.groupby('grouping column')['aggregating column'].agg('aggregating function')
    • e.g. df.groupby('fruit')['tastiness'].mean()

for going from wide to long format, prefer melt over stack

for going from long to wide format, prefer pivot_table over unstack or pivot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment