Created
October 25, 2021 14:24
-
-
Save GDBSD/aafa45714815506ae9f90423809989dd to your computer and use it in GitHub Desktop.
Remove non-printing characters from a Pandas dataframe
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def remove_non_printing_chars(df): | |
| """Clean a dataframe column to remove any non-printing characters. | |
| We've encountered values like tabs in some of the data. | |
| :param df: Pandas dataframe | |
| :return: Pandas dataframe | |
| """ | |
| clean_df = df.copy(deep=True) | |
| clean_df = clean_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x) | |
| for col in list(clean_df.columns): | |
| clean_df.replace({col: r'\x7f\x80.$'}, {col: ''}, regex=True, inplace=True) | |
| clean_df.replace({col: ' '}, {col: ''}, regex=False, inplace=True) | |
| clean_df[col].replace('', np.nan, inplace=True) | |
| return clean_df |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment