Skip to content

Instantly share code, notes, and snippets.

@raeq
Created July 19, 2020 13:21
Show Gist options
  • Save raeq/674a72fb5dbbcf62fc30689942f70936 to your computer and use it in GitHub Desktop.
Save raeq/674a72fb5dbbcf62fc30689942f70936 to your computer and use it in GitHub Desktop.
Tokenize a string using NLTK.
import nltk
def tokenize_text(input_str: str = "") -> list:
return nltk.wordpunct_tokenize(input_str)
assert tokenize_text("Good muffins cost $3.88\nin New York.") == [
"Good",
"muffins",
"cost",
"$",
"3",
".",
"88",
"in",
"New",
"York",
".",
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment