Created
February 25, 2019 14:30
-
-
Save winnydejong/a0291b4cf1d4c5fef446b01d2469c14e to your computer and use it in GitHub Desktop.
Helpful function to get datatype, count of nulls, and count of unique values for every column in a Pandas dataframe
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Helpful function to look through the columns of a Pandas dataframe | |
# By Roland Jeannier, https://medium.com/@rtjeannier/pandas-101-fbb5bf86a9bc | |
def eda_helper(df): | |
dict_list = [] | |
for col in df.columns: | |
data = df[col] | |
dict_ = {} | |
# The null count for a column. | |
dict_.update({"null_count" : data.isnull().sum()}) | |
# Counting the unique values in a column | |
dict_.update({"unique_count" : len(data.unique())}) | |
# Finding the types of data in the column | |
# This is useful for finding out potential problems with type mismatches | |
dict_.update({"data_type" : set([type(d).__name__ for d in data])}) | |
#dict_.update({"score" : match[1]}) | |
dict_list.append(dict_) | |
eda_df = pd.DataFrame(dict_list) | |
eda_df.index = df.columns | |
eda_df.sort_values(by=['null_count', 'unique_count'], ascending=[True, False], inplace=True) | |
return eda_df |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment