Skip to content

Instantly share code, notes, and snippets.

View zainhas's full-sized avatar
:electron:
learning new things...

Zain Hasan zainhas

:electron:
learning new things...
View GitHub Profile
@zainhas
zainhas / API_prefill_decode_speed.py
Created February 8, 2025 17:36
Calculate prefill and decoding speed of a Model API endpoint
import time
from together import Together
client = Together(api_key = "---")
prompt = "How many r's in the word strawberry?"
prefill_tokens_len = len(tokenizer.encode(prompt))
decode_text = ""
@zainhas
zainhas / thinking_tokens.py
Last active February 18, 2025 16:01
Extract ONLY thinking tokens from DeepSeek-R1
from together import Together
client = Together(api_key = TOGETHER_API_KEY)
question = "Which is larger 9.9 or 9.11?"
thought = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=[{"role": "user", "content": question}],
stop = ['</think>']
)
@zainhas
zainhas / gist:e15120eb7f9dcbdbeaf7575d7e6fe8c8
Created January 24, 2025 00:32
Extract ONLY thinking tokens from DeepSeek-R1
from together import Together
client = Together(api_key = TOGETHER_API_KEY)
question = "Which is larger 9.9 or 9.11?"
thought = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=[{"role": "user", "content": "Which is larger 9.9 or 9.11?"}],
stop = ['</think>']
)