Skip to content

Instantly share code, notes, and snippets.

@rsperl
Last active July 25, 2022 12:47
Show Gist options
  • Save rsperl/09f14d903547735a816094ca04734b21 to your computer and use it in GitHub Desktop.
Save rsperl/09f14d903547735a816094ca04734b21 to your computer and use it in GitHub Desktop.
panda tricks #python #panda #snippet

Python Pandas Tips and Tricks

source


Categories

Reading files

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

5 useful "read_csv" parameters that are often overlooked:

➑️ names: specify column names
➑️ usecols: which columns to keep
➑️ dtype: specify data types
➑️ nrows: # of rows to read
➑️ na_values: strings to recognize as NaN#Python #DataScience #pandastricks

β€” Kevin Markham (@justmarkham) August 19, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

⚠️ Got bad data (or empty rows) at the top of your CSV file? Use these read_csv parameters:

➑️ header = row number of header (start counting at 0)
➑️ skiprows = list of row numbers to skip

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/t1M6XkkPYG

β€” Kevin Markham (@justmarkham) September 3, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Two easy ways to reduce DataFrame memory usage:
1. Only read in columns you need
2. Use 'category' data type with categorical data.

Example:
df = https://t.co/Ib52aQAdkA_csv('file.csv', usecols=['A', 'C', 'D'], dtype={'D':'category'})#Python #pandastricks

β€” Kevin Markham (@justmarkham) June 21, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

You can read directly from a compressed file:
df = https://t.co/Ib52aQAdkA_csv('https://t.co/3JAwA8h7FJ')

Or write to a compressed file:https://t.co/ySXYEf6MjY_csv('https://t.co/3JAwA8h7FJ')

Also supported: .gz, .bz2, .xz#Python #pandas #pandastricks

β€” Kevin Markham (@justmarkham) July 4, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Are your dataset rows spread across multiple files, but you need a single DataFrame?

Solution:
1. Use glob() to list your files
2. Use a generator expression to read files and concat() to combine them
3. πŸ₯³

See example πŸ‘‡#Python #DataScience #pandastricks pic.twitter.com/qtKpzEoSC3

β€” Kevin Markham (@justmarkham) June 20, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to quickly get data from Excel or Google Sheets into pandas?

1. Copy data to clipboard
2. df = https://t.co/Ib52aQAdkA_clipboard()
3. πŸ₯³

See example πŸ‘‡

Learn 25 more tips & tricks: https://t.co/6akbxXG6SI#Python #DataScience #pandas #pandastricks pic.twitter.com/M2Yw0NAXRe

β€” Kevin Markham (@justmarkham) July 15, 2019

Reading from the web

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to read a JSON file from the web? Use read_json() to read it directly from a URL into a DataFrame! 😎

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/gei6eeudiq

β€” Kevin Markham (@justmarkham) September 9, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick #68:

Want to scrape a web page? Try read_html()!

Definitely worth trying before bringing out a more complex tool (Beautiful Soup, Selenium, etc.)

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/sPKrea9wk1

β€” Kevin Markham (@justmarkham) September 18, 2019

Creating example DataFrames

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to create an example DataFrame? Here are 3 easy options:

pd.DataFrame({'col_one':[10, 20], 'col_two':[30, 40]})
pd.DataFrame(np.random.rand(2, 3), columns=list('abc'))
pd.util.testing.makeMixedDataFrame()

See output πŸ‘‡#Python #pandas #pandastricks pic.twitter.com/SSlZsd6OEj

β€” Kevin Markham (@justmarkham) June 28, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to create a DataFrame for testing?

pd.util.testing.makeDataFrame() ➑️ contains random values
.makeMissingDataframe() ➑️ some values missing
.makeTimeDataFrame() ➑️ has DateTimeIndex
.makeMixedDataFrame() ➑️ mixed data types#Python #pandas #pandastricks

β€” Kevin Markham (@justmarkham) July 10, 2019

Creating columns

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to create new columns (or overwrite existing columns) within a method chain? Use "assign"!

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/y0wEfbz0VA

β€” Kevin Markham (@justmarkham) September 17, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to create a bunch of new columns based on existing columns? Use this pattern:

for col in df.columns:
df[f'{col}_new'] = df[col].apply(my_function)

See example πŸ‘‡

Thanks to @pmbaumgartner for this trick!#Python #DataScience #pandas #pandastricks pic.twitter.com/7qvKn9UypE

β€” Kevin Markham (@justmarkham) September 16, 2019

Renaming columns

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

3 ways to rename columns:

1. Most flexible option:
df = df.rename({'A':'a', 'B':'b'}, axis='columns')

2. Overwrite all column names:
df.columns = ['a', 'b']

3. Apply string method:
df.columns = df.columns.str.lower()#Python #DataScience #pandastricks

β€” Kevin Markham (@justmarkham) July 16, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Add a prefix to all of your column names:
df.add_prefix('X_')

Add a suffix to all of your column names:
df.add_suffix('_Y')#Python #DataScience

β€” Kevin Markham (@justmarkham) June 11, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to rename all of your columns in the same way? Use a string method:

Replace spaces with _:
df.columns = df.columns.str.replace(' ', '_')

Make lowercase & remove trailing whitespace:
df.columns = df.columns.str.lower().str.rstrip()#Python #pandastricks

β€” Kevin Markham (@justmarkham) June 25, 2019

Selecting rows and columns

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

You can use f-strings (Python 3.6+) when selecting a Series from a DataFrame!

See example πŸ‘‡#Python #DataScience #pandas #pandastricks @python_tip pic.twitter.com/8qHEXiGBaB

β€” Kevin Markham (@justmarkham) September 13, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to select multiple rows/columns? "loc" is usually the solution:

select a slice (inclusive):
df.loc[0:4, 'col_A':'col_D']

select a list:
df.loc[[0, 3], ['col_A', 'col_C']]

select by condition:
df.loc[df.col_A=='val', 'col_D']#Python #pandastricks

β€” Kevin Markham (@justmarkham) July 3, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

"loc" selects by label, and "iloc" selects by position.

But what if you need to select by label *and* position? You can still use loc or iloc!

See example πŸ‘‡

P.S. Don't use "ix", it has been deprecated since 2017.#Python #DataScience #pandas #pandastricks pic.twitter.com/SpFkjWYEE0

β€” Kevin Markham (@justmarkham) August 1, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Reverse column order in a DataFrame:
df.loc[:, ::-1]

Reverse row order:
df.loc[::-1]

Reverse row order and reset the index:
df.loc[::-1].reset_index(drop=True)

Want more #pandastricks? Working on a video right now, stay tuned... πŸŽ₯#Python #DataScience

β€” Kevin Markham (@justmarkham) June 12, 2019

Filtering rows by condition

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Filter DataFrame by multiple OR conditions:
df[(df.color == 'red') | (df.color == 'green') | (df.color == 'blue')]

Shorter way:
df[df.color.isin(['red', 'green', 'blue'])]

Invert the filter:
df[~df.color.isin(['red', 'green', 'blue'])]#Python #pandastricks

β€” Kevin Markham (@justmarkham) June 13, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Are you trying to filter a DataFrame using lots of criteria? It can be hard to write ✏️ and to read! πŸ”

Instead, save the criteria as objects and use them to filter. Or, use reduce() to combine the criteria!

See example πŸ‘‡#Python #DataScience #pandastricks pic.twitter.com/U9NV27RIjQ

β€” Kevin Markham (@justmarkham) August 28, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to filter a DataFrame that doesn't have a name?

Use the query() method to avoid creating an intermediate variable!

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/NyUOOSr7Sc

β€” Kevin Markham (@justmarkham) July 25, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to refer to a local variable within a query() string? Just prefix it with the @ symbol!

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/PfXcASWDdC

β€” Kevin Markham (@justmarkham) August 13, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

If you want to use query() on a column name containing a space, just surround it with backticks! (New in pandas 0.25)

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/M5ZSRVr3no

β€” Kevin Markham (@justmarkham) July 30, 2019

Manipulating strings

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to concatenate two string columns?

Option 1: Use a string method 🧢
Option 2: Use plus signs βž•

See example πŸ‘‡

Which option do you prefer, and why?#Python #DataScience #pandas #pandastricks pic.twitter.com/SsjBAMqkxB

β€” Kevin Markham (@justmarkham) August 22, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to split a string into multiple columns? Use str.split() method, expand=True to return a DataFrame, and assign it to the original DataFrame.

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/wZ4okQZ9Dy

β€” Kevin Markham (@justmarkham) July 9, 2019

Working with data types

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Numbers stored as strings? Try astype():
df.astype({'col1':'int', 'col2':'float'})

But it will fail if you have any invalid input. Better way:
df.apply(https://t.co/H90jtE9QMp_numeric, errors='coerce')

Converts invalid input to NaN πŸŽ‰#Python #pandastricks

β€” Kevin Markham (@justmarkham) June 17, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Select columns by data type:https://t.co/8c3VWfaERD_dtypes(include='number')https://t.co/8c3VWfaERD_dtypes(include=['number', 'category', 'object'])https://t.co/8c3VWfaERD_dtypes(exclude=['datetime', 'timedelta'])#Python #DataScience #pandas #pandastricks

β€” Kevin Markham (@justmarkham) June 14, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Two useful properties of ordered categories:
1️⃣ You can sort the values in logical (not alphabetical) order
2️⃣ Comparison operators also work logically

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/HeYZ3P3gPP

β€” Kevin Markham (@justmarkham) August 8, 2019

Encoding data

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to convert a column from continuous to categorical? Use cut():

df['age_groups'] = pd.cut(df.age, bins=[0, 18, 65, 99], labels=['child', 'adult', 'elderly'])

0 to 18 ➑️ 'child'
18 to 65 ➑️ 'adult'
65 to 99 ➑️ 'elderly'#Python #pandas #pandastricks

β€” Kevin Markham (@justmarkham) July 2, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to dummy encode (or "one hot encode") your DataFrame? Use pd.get_dummies(df) to encode all object & category columns.

Want to drop the first level since it provides redundant info? Set drop_first=True.

See example & read thread πŸ‘‡#Python #pandastricks pic.twitter.com/g0XjJ44eg2

β€” Kevin Markham (@justmarkham) August 5, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to apply the same mapping to multiple columns at once? Use "applymap" (DataFrame method) with "get" (dictionary method).

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/WU4AmeHP4O

β€” Kevin Markham (@justmarkham) August 30, 2019

Extracting data from lists

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Has your data ever been TRAPPED in a Series of Python lists? πŸ”’

Expand the Series into a DataFrame by using apply() and passing it the Series constructor πŸ”“

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/ZvysqaRz6S

β€” Kevin Markham (@justmarkham) June 27, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Do you have a Series containing lists of items? Create one row for each item using the "explode" method πŸ’₯

New in pandas 0.25! See example πŸ‘‡

🀯#Python #DataScience #pandas #pandastricks pic.twitter.com/ix5d8CLg57

β€” Kevin Markham (@justmarkham) August 12, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Does your Series contain comma-separated items? Create one row for each item:

βœ‚οΈ "str.split" creates a list of strings
⬅️ "assign" overwrites the existing column
πŸ’₯ "explode" creates the rows (new in pandas 0.25)

See example πŸ‘‡#Python #pandas #pandastricks pic.twitter.com/OqZNWdarP0

β€” Kevin Markham (@justmarkham) August 14, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

πŸ’₯ "explode" takes a list of items and creates one row for each item (new in pandas 0.25)

You can also do the reverse! See example πŸ‘‡

Thanks to @EForEndeavour for this tip πŸ™Œ#Python #DataScience #pandas #pandastricks pic.twitter.com/4UBxbzHS51

β€” Kevin Markham (@justmarkham) August 16, 2019

Working with time series data

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

If you need to create a single datetime column from multiple columns, you can use to_datetime() πŸ“†

See example πŸ‘‡

You must include: month, day, year
You can also include: hour, minute, second#Python #DataScience #pandas #pandastricks pic.twitter.com/0bip6SRDdF

β€” Kevin Markham (@justmarkham) July 8, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

One reason to use the datetime data type is that you can access many useful attributes via "dt", like:
df.column.dt.hour

Other attributes include: year, month, day, dayofyear, week, weekday, quarter, days_in_month...

See full list πŸ‘‡#Python #pandastricks pic.twitter.com/z405STKqKY

β€” Kevin Markham (@justmarkham) August 2, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to perform an aggregation (sum, mean, etc) with a given frequency (monthly, yearly, etc)?

Use resample! It's like a "groupby" for time series data. See example πŸ‘‡

"Y" means yearly. See list of frequencies: https://t.co/oPDx85yqFT#Python #pandastricks pic.twitter.com/nweqbHXEtd

β€” Kevin Markham (@justmarkham) July 18, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to calculate the difference between each row and the previous row? Use df.col_name.diff()

Want to calculate the percentage change instead? Use df.col_name.pct_change()

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/5EGYqpNPC3

β€” Kevin Markham (@justmarkham) August 27, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to convert a datetime Series from UTC to another time zone?

1. Set current time zone ➑️ tz_localize('UTC')
2. Convert ➑️ tz_convert('America/Chicago')

Automatically handles Daylight Savings Time!

See example πŸ‘‡#Python #DataScience #pandastricks pic.twitter.com/ztzMXcgkFY

β€” Kevin Markham (@justmarkham) July 31, 2019

Handling missing values

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Calculate % of missing values in each column:
df.isna().mean()

Drop columns with any missing values:
df.dropna(axis='columns')

Drop columns in which more than 10% of values are missing:
df.dropna(thresh=len(df)*0.9, axis='columns')#Python #pandastricks

β€” Kevin Markham (@justmarkham) June 19, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to fill missing values in your time series data? Use df.interpolate()

Defaults to linear interpolation, but many other methods are supported!

Want more pandas tricks? Watch this:
πŸ‘‰ https://t.co/6akbxXXHKg πŸ‘ˆ#Python #DataScience #pandas #pandastricks pic.twitter.com/JjH08dvjMK

β€” Kevin Markham (@justmarkham) July 12, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Do you need to store missing values ("NaN") in an integer Series? Use the "Int64" data type!

See example πŸ‘‡

(New in v0.24, API is experimental/subject to change)#Python #DataScience #pandas #pandastricks pic.twitter.com/mN7Ud53Rls

β€” Kevin Markham (@justmarkham) August 15, 2019

Using aggregation functions

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Instead of aggregating by a single function (such as 'mean'), you can aggregate by multiple functions by using 'agg' (and passing it a list of functions) or by using 'describe' (for summary statistics πŸ“Š)

See example πŸ‘‡#Python #DataScience #pandastricks pic.twitter.com/Emg3zLAocB

β€” Kevin Markham (@justmarkham) July 19, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Did you know that "last" is an aggregation function, just like "sum" and "mean"?

Can be used with a groupby to extract the last value in each group. See example πŸ‘‡

P.S. You can also use "first" and "nth" functions!#Python #DataScience #pandas #pandastricks pic.twitter.com/WKJtNIUxwz

β€” Kevin Markham (@justmarkham) August 9, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Are you applying multiple aggregations after a groupby? Try "named aggregation":

βœ… Allows you to name the output columns
❌ Avoids a column MultiIndex

New in pandas 0.25! See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/VXJz6ShZbc

β€” Kevin Markham (@justmarkham) August 21, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to combine the output of an aggregation with the original DataFrame?

Instead of: df.groupby('col1').col2.func()
Use: df.groupby('col1').col2.transform(func)

"transform" changes the output shape

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/9dkcAGpTYK

β€” Kevin Markham (@justmarkham) September 4, 2019

Using cumulative functions

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to calculate a running total (or "cumulative sum")? Use the cumsum() function! Also works with groupby()

See example πŸ‘‡

Other cumulative functions: cummax(), cummin(), cumprod()#Python #DataScience #pandas #pandastricks pic.twitter.com/H4whqlV2ky

β€” Kevin Markham (@justmarkham) September 6, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to calculate a running count within groups? Do this:
df.groupby('col').cumcount() + 1

See example πŸ‘‡

Thanks to @kjbird15 and @EForEndeavour for this trick! πŸ™Œ#Python #DataScience #pandas #pandastricks @python_tip pic.twitter.com/jSz231QmmS

β€” Kevin Markham (@justmarkham) September 11, 2019

Random sampling

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Randomly sample rows from a DataFrame:
df.sample(n=10)
df.sample(frac=0.25)

Useful parameters:
➑️ random_state: use any integer for reproducibility
➑️ replace: sample with replacement
➑️ weights: weight based on values in a column 😎#Python #pandastricks pic.twitter.com/j2AyoTLRKb

β€” Kevin Markham (@justmarkham) August 20, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to shuffle your DataFrame rows?
df.sample(frac=1, random_state=0)

Want to reset the index after shuffling?
df.sample(frac=1, random_state=0).reset_index(drop=True)#Python #DataScience #pandas #pandastricks

β€” Kevin Markham (@justmarkham) August 26, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Split a DataFrame into two random subsets:

df_1 = df.sample(frac=0.75, random_state=42)
df_2 = df.drop(df_1.index)

(Only works if df's index values are unique)

P.S. Working on a video of my 25 best #pandastricks, stay tuned! πŸ“Ί#Python #pandas #DataScience

β€” Kevin Markham (@justmarkham) June 18, 2019

Merging DataFrames

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

When you are merging DataFrames, you can identify the source of each row (left/right/both) by setting indicator=True.

See example πŸ‘‡

P.S. Learn 25 more #pandastricks in 25 minutes: https://t.co/6akbxXG6SI#Python #DataScience #pandas pic.twitter.com/tkb2LiV4eh

β€” Kevin Markham (@justmarkham) July 23, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Merging datasets? Check that merge keys are unique in BOTH datasets:
pd.merge(left, right, validate='one_to_one')

βœ… Use 'one_to_many' to only check uniqueness in LEFT
βœ… Use 'many_to_one' to only check uniqueness in RIGHT#Python #DataScience #pandastricks

β€” Kevin Markham (@justmarkham) June 26, 2019

Styling DataFrames

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Two simple ways to style a DataFrame:

1️⃣ https://t.co/HRqLVf3cWC.hide_index()
2️⃣ https://t.co/HRqLVf3cWC.set_caption('My caption')

See example πŸ‘‡

For more style options, watch trick #25: https://t.co/6akbxXG6SI πŸ“Ί#Python #DataScience #pandas #pandastricks pic.twitter.com/8yzyQYz9vr

β€” Kevin Markham (@justmarkham) August 6, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to add formatting to your DataFrame? For example:
- hide the index
- add a caption
- format numbers & dates
- highlight min & max values

Watch πŸ‘‡ to learn how!

Code: https://t.co/HKroWYVIEs

25 more tricks: https://t.co/6akbxXG6SI#Python #pandastricks pic.twitter.com/AKQr7zVR7S

β€” Kevin Markham (@justmarkham) July 17, 2019

Exploring a dataset

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to explore a new dataset without too much work?

1. Pick one:
➑️ pip install pandas-profiling
➑️ conda install -c conda-forge pandas-profiling

2. import pandas_profiling
3. df.profile_report()
4. πŸ₯³

See example πŸ‘‡#Python #DataScience #pandastricks pic.twitter.com/srq5rptEUj

β€” Kevin Markham (@justmarkham) July 29, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Need to check if two Series contain the same elements?

❌ Don't do this:
df.A == df.B

βœ… Do this:
df.A.equals(df.B)

βœ… Also works for DataFrames:
df.equals(df2)

equals() properly handles NaNs, whereas == does not#Python #DataScience #pandas #pandastricks

β€” Kevin Markham (@justmarkham) June 24, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick #69:

Need to check if two Series are "similar"? Use this:

pd.testing.assert_series_equal(df.A, df.B, ...)

Useful arguments include:
➑️ check_names=False
➑️ check_dtype=False
➑️ check_exact=False

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/bdJBkiFxne

β€” Kevin Markham (@justmarkham) September 19, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to examine the "head" of a wide DataFrame, but can't see all of the columns?

Solution #1: Change display options to show all columns
Solution #2: Transpose the head (swaps rows and columns)

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/9sw7O7cPeh

β€” Kevin Markham (@justmarkham) July 24, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to plot a DataFrame? It's as easy as:
df.plot(kind='...')

You can use:
line πŸ“ˆ
bar πŸ“Š
barh
hist
box πŸ“¦
kde
area
scatter
hexbin
pie πŸ₯§

Other plot types are available via pd.plotting!

Examples: https://t.co/fXYtPeVpZX#Python #dataviz #pandastricks pic.twitter.com/kp82wA15S4

β€” Kevin Markham (@justmarkham) August 23, 2019

Handling warnings

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Did you encounter the dreaded SettingWithCopyWarning? πŸ‘»

The usual solution is to rewrite your assignment using "loc":

❌ df[df.col == val1].col = val2
βœ… df.loc[df.col == val1, 'col'] = val2

See example πŸ‘‡#Python #DataScience #pandastricks @python_tip pic.twitter.com/6L6IukTpBO

β€” Kevin Markham (@justmarkham) September 10, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Did you get a "SettingWithCopyWarning" when creating a new column? You are probably assigning to a DataFrame that was created from another DataFrame.

Solution: Use the "copy" method when copying a DataFrame!

See example πŸ‘‡#Python #DataScience #pandastricks pic.twitter.com/LrRNFyN6Qn

β€” Kevin Markham (@justmarkham) September 12, 2019

Other

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

If you've created a groupby object, you can access any of the groups (as a DataFrame) using the get_group() method.

See example πŸ‘‡#Python #DataScience #pandas #pandastricks pic.twitter.com/6Ya0kxMpgk

β€” Kevin Markham (@justmarkham) September 2, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Do you have a Series with a MultiIndex?

Reshape it into a DataFrame using the unstack() method. It's easier to read, plus you can interact with it using DataFrame methods!

See example πŸ‘‡

P.S. Want a video with my top 25 #pandastricks? πŸ“Ί#Python #pandas pic.twitter.com/DKHwN03A7J

β€” Kevin Markham (@justmarkham) July 1, 2019

🐼🀹 pandas trick:

There are many display options you can change:
max_rows
max_columns
max_colwidth
precision
date_dayfirst
date_yearfirst

How to use:
pd.set_option('display.max_rows', 80)
pd.reset_option('display.max_rows')

See all:
pd.describe_option()#Python #pandastricks

β€” Kevin Markham (@justmarkham) July 26, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Show total memory usage of a DataFrame:https://t.co/LkpMP7wWOi(memory_usage='deep')

Show memory used by each column:
df.memory_usage(deep=True)

Need to reduce? Drop unused columns, or convert object columns to 'category' type.#Python #pandas #pandastricks

β€” Kevin Markham (@justmarkham) July 5, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick #70:

Need to know which version of pandas you're using?

➑️ pd.__version__

Need to know the versions of its dependencies (numpy, matplotlib, etc)?

➑️ https://t.co/84gN00FdzJ_versions()

Helpful when reading the documentation! πŸ“š#Python #pandas #pandastricks

β€” Kevin Markham (@justmarkham) September 20, 2019

πŸΌπŸ€Ήβ€β™‚οΈ pandas trick:

Want to use NumPy without importing it? You can access ALL of its functionality from within pandas! See example πŸ‘‡

This is probably *not* a good idea since it breaks with a long-standing convention. But it's a neat trick 😎#Python #pandas #pandastricks pic.twitter.com/pZbXwuj6Kz

β€” Kevin Markham (@justmarkham) July 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment