Skip to content

Instantly share code, notes, and snippets.

@a-agmon
Created June 29, 2020 04:09
Show Gist options
  • Save a-agmon/3d2021779dd30227569017ab9da5355f to your computer and use it in GitHub Desktop.
Save a-agmon/3d2021779dd30227569017ab9da5355f to your computer and use it in GitHub Desktop.
VOCAB_SIZE = 750
# take just the target feature
clean_sequences = sequences.loc[:,FEAT_FIELD]
# create a tokenizer with 750 'words' -
# we will have a number representing each of the top 750 wordsx
tokenizer = Tokenizer(num_words=VOCAB_SIZE)
# fit the tokenizer on our data
tokenizer.fit_on_texts(clean_sequences)
dictionary = tokenizer.word_index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment