Skip to content

Instantly share code, notes, and snippets.

@GDBSD
Created October 25, 2021 14:24
Show Gist options
  • Select an option

  • Save GDBSD/aafa45714815506ae9f90423809989dd to your computer and use it in GitHub Desktop.

Select an option

Save GDBSD/aafa45714815506ae9f90423809989dd to your computer and use it in GitHub Desktop.
Remove non-printing characters from a Pandas dataframe
def remove_non_printing_chars(df):
"""Clean a dataframe column to remove any non-printing characters.
We've encountered values like tabs in some of the data.
:param df: Pandas dataframe
:return: Pandas dataframe
"""
clean_df = df.copy(deep=True)
clean_df = clean_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
for col in list(clean_df.columns):
clean_df.replace({col: r'\x7f\x80.$'}, {col: ''}, regex=True, inplace=True)
clean_df.replace({col: ' '}, {col: ''}, regex=False, inplace=True)
clean_df[col].replace('', np.nan, inplace=True)
return clean_df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment