Skip to content

Instantly share code, notes, and snippets.

@agustinustheo
Created February 22, 2019 12:45
Show Gist options
  • Select an option

  • Save agustinustheo/652a8c17294e68413180e30cb5cd5575 to your computer and use it in GitHub Desktop.

Select an option

Save agustinustheo/652a8c17294e68413180e30cb5cd5575 to your computer and use it in GitHub Desktop.
Preprocess Text function for Filtering Fake News Blog
def preproccess_text(text_messages):
# change words to lower case - Hello, HELLO, hello are all the same word
processed = text_messages.lower()
# Remove remove unnecessary noise
processed = re.sub(r'\[[0-9]+\]|\[[a-z]+\]|\[[A-Z]+\]|\\\\|\\r|\\t|\\n|\\', ' ', processed)
# Remove punctuation
processed = re.sub(r'[.,\/#!%\^&\*;\[\]:|+{}=\-\'"_”“`~(’)?]', ' ', processed)
# Replace whitespace between terms with a single space
processed = re.sub(r'\s+', ' ', processed)
# Remove leading and trailing whitespace
processed = re.sub(r'^\s+|\s+?$', '', processed)
return processed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment