Skip to content

Instantly share code, notes, and snippets.

@winnydejong
Created February 25, 2019 14:30
Show Gist options
  • Save winnydejong/a0291b4cf1d4c5fef446b01d2469c14e to your computer and use it in GitHub Desktop.
Save winnydejong/a0291b4cf1d4c5fef446b01d2469c14e to your computer and use it in GitHub Desktop.
Helpful function to get datatype, count of nulls, and count of unique values for every column in a Pandas dataframe
# Helpful function to look through the columns of a Pandas dataframe
# By Roland Jeannier, https://medium.com/@rtjeannier/pandas-101-fbb5bf86a9bc
def eda_helper(df):
dict_list = []
for col in df.columns:
data = df[col]
dict_ = {}
# The null count for a column.
dict_.update({"null_count" : data.isnull().sum()})
# Counting the unique values in a column
dict_.update({"unique_count" : len(data.unique())})
# Finding the types of data in the column
# This is useful for finding out potential problems with type mismatches
dict_.update({"data_type" : set([type(d).__name__ for d in data])})
#dict_.update({"score" : match[1]})
dict_list.append(dict_)
eda_df = pd.DataFrame(dict_list)
eda_df.index = df.columns
eda_df.sort_values(by=['null_count', 'unique_count'], ascending=[True, False], inplace=True)
return eda_df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment