lewtun’s gists

lewtun / mcp-bench-sample-prompt.txt

Created September 6, 2025 10:21

MCP-Bench Sample Prompt

	<\|im_start\|>system
	You are a strategic multi-tool AI agent planner for Round 1. Plan parallel execution of independent tools for this round.<\|im_end\|>
	<\|im_start\|>user
	You are a strategic decision-making expert for a multi-tool AI agent using the provided tools to perform the task.

	TASK: "Hey, I’m working on this new dashboard that pulls search results from three different services—one for AI stuff, one for code hosting, and one for edge networking—and I’m scratching my head over how each handles pagination. Some APIs might use a page/page_size setup, others a cursor or next_cursor, and I’m not even sure if all of them support paging in their search calls or if I have to switch to their “list” routes instead.

	Could you dig into each service’s search endpoints and tell me:
	• whether it pages at all or not
	• if it does, what style it uses (page numbers, cursors, etc.)

lewtun / remove_emojis.py

Created July 4, 2025 11:41

	# pip install emoji
	import argparse
	from datasets import load_dataset
	import emoji

	def remove_emoji(text: str) -> str:
	return emoji.replace_emoji(text, replace='').strip()

	def format_messages(x):
	emojis_found = False

lewtun / chat_template.jinja

Last active July 8, 2025 16:01

SmolLM3 chat template

	{# ───── defaults ───── #}
	{%- if enable_thinking is not defined -%}
	{%- set enable_thinking = true -%}
	{%- endif -%}

	{# ───── reasoning mode ───── #}
	{%- if enable_thinking -%}
	{%- set reasoning_mode = "/think" -%}
	{%- else -%}
	{%- set reasoning_mode = "/no_think" -%}

lewtun / grpo_benchmark_v0.py

Created March 3, 2025 14:04

GRPO benchmarking

	from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, set_seed
	import time
	import torch

	set_seed(0)
	device = "cuda"
	model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-1.5B",
	attn_implementation="flash_attention_2",
	torch_dtype="bfloat16"

lewtun / grpo_vllm.py

Last active January 29, 2025 10:51

GRPO with vLLM demo

	from datasets import load_dataset
	from trl import GRPOConfig, GRPOTrainer
	import random
	"""Usage (on 8 x H100s):

	pip install vllm==0.7.0 --extra-index-url https://download.pytorch.org/whl/cu121
	pip install -e '.[dev]'

	# DDP
	accelerate launch --config_file examples/accelerate_configs/multi_gpu.yaml --num_processes 7 scratch/grpo_demo.py

lewtun / extract_boxed.py

Last active January 9, 2025 12:19

Simple parser to extract the \boxed{answer} parts from LLM completions

	from typing import Optional

	def extract_boxed_solution(text: str) -> Optional[str]:
	"""
	Extracts the content of the last `\boxed{}` in a given LaTeX-style text.

	Args:
	text (str): The input string containing LaTeX-style content.

	Returns:

lewtun / sft_llama.py

Last active November 9, 2024 02:33

SFT Llama 3.1 8B - full training vs LoRA vs QLoRA

	# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software
	# distributed under the License is distributed on an "AS IS" BASIS,

lewtun / dpo_winrate.py

Created September 24, 2024 20:45

DPO with WinRateCallback

	# flake8: noqa
	# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
	#
	# Licensed under the Apache License, Version 2.0 (the "License");
	# you may not use this file except in compliance with the License.
	# You may obtain a copy of the License at
	#
	# http://www.apache.org/licenses/LICENSE-2.0
	#
	# Unless required by applicable law or agreed to in writing, software

lewtun / dbrx-instruct-ifeval.json

Created April 24, 2024 21:48

DBRX-Instruct IFEval scores from LightEval

	{
	"config_general": {
	"lighteval_sha": "?",
	"num_fewshot_seeds": 1,
	"override_batch_size": 4,
	"max_samples": null,
	"job_id": "",
	"start_time": 1163608.425196265,
	"end_time": 1173616.769654949,
	"total_evaluation_time_secondes": "10008.34445868386",

lewtun / view_details.py

Last active February 28, 2024 11:28

View LightEval predictions

	"""
	First install: pip install datasets pandas rich transformers

	Usage:

	# Loglikelihood evals
	python view_details.py --filepath path/to/parquet/details

	# Generative evals
	python view_details.py --filepath path/to/parquet/details --is_generative