Skip to content

Instantly share code, notes, and snippets.

@eustin
Created June 8, 2020 04:10
Show Gist options
  • Save eustin/ecb5e468a3dc0d20ee9b1e49598c463a to your computer and use it in GitHub Desktop.
Save eustin/ecb5e468a3dc0d20ee9b1e49598c463a to your computer and use it in GitHub Desktop.
combined_texts = [query_1, *bing_search_results, query_2, *google_search_results]
tokeniser = tf.keras.preprocessing.text.Tokenizer()
tokeniser.fit_on_texts(combined_texts)
# we add one here to account for the padding word
vocab_size = max(tokeniser.index_word) + 1
print(vocab_size)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment