Skip to content

Instantly share code, notes, and snippets.

@justinhchae
Last active January 13, 2021 14:22
Show Gist options
  • Save justinhchae/6f76c7ee886da34803d36a415be3452a to your computer and use it in GitHub Desktop.
Save justinhchae/6f76c7ee886da34803d36a415be3452a to your computer and use it in GitHub Desktop.
import pandas as pd
gitcsv = 'https://raw.githubusercontent.com/justinhchae/medium/main/bools.csv'
df = pd.read_csv(gitcsv)
# some columns that are supposed to be bool
cols = ['flag1', 'flag2', 'flag3']
# use np.where to find and match, then replace
# this says: Where the dataframe is null, replace with pd.NA,
# else, where equal to 1, replace with True, else, the original value
df[cols] = np.where(df[cols].isnull(), pd.NA,
np.where(df[cols]==1., True, df[cols]))
# lastly, use boolean instead of bool
# This is the difference between 'regular' bool and the boolean array
df[cols] = df[cols].astype('boolean')
print(df.head())
print(df['flag1'].unique())
""" boolean with nullables
category flag1 flag2 flag3
0 d False <NA> <NA>
1 d True <NA> <NA>
2 c False <NA> <NA>
3 b False <NA> <NA>
4 b False <NA> <NA>
<BooleanArray>
[False, True]
Length: 2, dtype: boolean
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment