Last active
April 7, 2025 20:13
-
-
Save xenova/a452a6474428de0182b17605a98631ee to your computer and use it in GitHub Desktop.
Convert tiktoken tokenizers to the Hugging Face tokenizers format
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Shouldn't 'tokenizer_class' be 'GPT2Tokenizer' in all cases? This is the huggingface concrete class that's instantiated - i.e. by doing this you can use
Rather than
GPT2TokenizerFast
(which then generates a warning).