Created
October 17, 2021 11:15
-
-
Save vinimonteiro/a5570bf24b11e6ae1a25cb01ccf1e53a to your computer and use it in GitHub Desktop.
Tokenization spacy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import spacy | |
#load core english library | |
nlp = spacy.load("en_core_web_sm") | |
text_english = """Imagine this: instead of sending a four-hundred-pound rover vehicle to Mars, | |
we merely shoot over to the planet a single sphere, one that can fit on the end of a pin. | |
Using energy from sources around it, the sphere divides itself into a diversified army of | |
similar spheres. The spheres hang on to each other and sprout features: wheels, lenses, | |
temperature sensors, and a full internal guidance system. You'd be gobsmacked to watch | |
such a system discharge itself.""" | |
doc = nlp(text_english) | |
tokens = [token.text for token in doc] | |
print(tokens) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment