Skip to content

Instantly share code, notes, and snippets.

@Lexie88rus
Created August 29, 2019 18:32
Show Gist options
  • Save Lexie88rus/8a4787207ec712f2db800fd70a3a003a to your computer and use it in GitHub Desktop.
Save Lexie88rus/8a4787207ec712f2db800fd70a3a003a to your computer and use it in GitHub Desktop.
Title cleaning function
import re
# Lowercase, remove punctuation and numbers from kernel titles
def clean_title(title):
'''
Function to lowercase, remove punctuation and numbers from kernel titles
'''
# lowercase
title = str(title).lower()
# replace punctuation into spaces
title = re.sub(r"[,.;@#?!&$%<>-_*/\()~='+:`]+\ *", " ", title)
title = re.sub('-', ' ', title)
title = re.sub("''", ' ', title)
# replace numbers into spaces
title = re.sub(r"[0123456789]+\ *", " ", title)
#remove duplicated spaces
title = re.sub(' +', ' ', title)
return title.strip()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment