Skip to content

Instantly share code, notes, and snippets.

@rsepassi
Created February 25, 2019 21:55
Show Gist options
  • Save rsepassi/abc7cbc0933db759c9fa65e875aa87f0 to your computer and use it in GitHub Desktop.
Save rsepassi/abc7cbc0933db759c9fa65e875aa87f0 to your computer and use it in GitHub Desktop.
imdb = tfds.builder("imdb_reviews/subwords8k")
# Get the TextEncoder from DatasetInfo
encoder = imdb.info.features["text"].encoder
assert isinstance(encoder, tfds.features.text.SubwordTextEncoder)
# Encode, decode
ids = encoder.encode("Hello world")
assert encoder.decode(ids) == "Hello world"
# Get the vocabulary size
vocab_size = encoder.vocab_size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment