Skip to content

Instantly share code, notes, and snippets.

@pythonlessons
Created August 24, 2023 10:12
Show Gist options
  • Save pythonlessons/6e55c05dc91deed8d360a449601702df to your computer and use it in GitHub Desktop.
Save pythonlessons/6e55c05dc91deed8d360a449601702df to your computer and use it in GitHub Desktop.
transformers_nlp_data
# prepare Spanish tokenizer, this is the input language
tokenizer = CustomTokenizer(char_level=True)
tokenizer.fit_on_texts(es_training_data)
tokenizer.save("tokenizer.json")
# prepare English tokenizer, this is the output language
detokenizer = CustomTokenizer(char_level=True)
detokenizer.fit_on_texts(en_training_data)
detokenizer.save("detokenizer.json")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment