Skip to content

Instantly share code, notes, and snippets.

@gamingflexer
Created July 19, 2023 09:02
Show Gist options
  • Select an option

  • Save gamingflexer/3364999976db4f8ba8df7829d7dfe384 to your computer and use it in GitHub Desktop.

Select an option

Save gamingflexer/3364999976db4f8ba8df7829d7dfe384 to your computer and use it in GitHub Desktop.
Anthropic's tokenizer for Claude
from transformers import PreTrainedTokenizerFast
fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="/home/ubuntu/LLM/module/claude-v1-tokenization.json")
text = "Hello, this is a test input."
tokens = fast_tokenizer.tokenize(text)
tokens
@danikhan632
Copy link
Copy Markdown

@ayansengupta17
Copy link
Copy Markdown

Could you provide the link for Claude 3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment