Last active
August 1, 2020 15:21
-
-
Save jrjames83/3ebe65d63e045e26dc88f23ec75122f8 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import string | |
# Some random document | |
document = """BigQuery sure makes life easier for data scientists. You can query data for insights, build high quality ML models and easily interface with other Google Cloud services.""" | |
# Remove punctuation | |
doc_wo_punct = document.translate(str.maketrans('', '', string.punctuation)) | |
# Some keywords we'd like to extract | |
keywords = ["bigquery", "ML", "insights", "SQL", "analysis"] | |
# Get terms from the document | |
doc_terms = [x.lower().strip() for x in doc_wo_punct.split()] | |
# Get terms from document which occurr in our keywords list - yay | |
# Yes a set lookup is O(1) I know -- this is just illustrative | |
doc_terms_from_keywords = [x for x in doc_terms if x in [j.lower() for j in keywords]] | |
print(doc_terms_from_keywords) | |
>> ['bigquery', 'insights', 'ml'] | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment