Skip to content

Instantly share code, notes, and snippets.

@vinimonteiro
Created October 17, 2021 11:15
Show Gist options
  • Save vinimonteiro/a5570bf24b11e6ae1a25cb01ccf1e53a to your computer and use it in GitHub Desktop.
Save vinimonteiro/a5570bf24b11e6ae1a25cb01ccf1e53a to your computer and use it in GitHub Desktop.
Tokenization spacy
import spacy
#load core english library
nlp = spacy.load("en_core_web_sm")
text_english = """Imagine this: instead of sending a four-hundred-pound rover vehicle to Mars,
we merely shoot over to the planet a single sphere, one that can fit on the end of a pin.
Using energy from sources around it, the sphere divides itself into a diversified army of
similar spheres. The spheres hang on to each other and sprout features: wheels, lenses,
temperature sensors, and a full internal guidance system. You'd be gobsmacked to watch
such a system discharge itself."""
doc = nlp(text_english)
tokens = [token.text for token in doc]
print(tokens)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment