Skip to content

Instantly share code, notes, and snippets.

View createthis's full-sized avatar

Jesse createthis

View GitHub Profile
@createthis
createthis / bench_indexer_tilelang.py
Created November 12, 2025 13:14
bench_indexer_tilelang.py
#!/usr/bin/env python3
import argparse
import torch
# Prefer local examples path resolution if running from repo root
try:
from examples.deepseek_v32.utils import per_custom_dims_cast_to_fp8 as _to_fp8
def to_fp8(x):
# Cast along last dim to FP8 E4M3 to match kernel expectations
# Handle both (x, dims, use_ue8m0) and (x, dims) signatures and return the scaled tensor only.
@createthis
createthis / bench_topk_tilelang.py
Created November 12, 2025 13:17
bench_topk_tilelang.py
#!/usr/bin/env python3
import argparse
import time
import torch
# TileLang example kernels
from examples.deepseek_v32.topk_selector import tl_topk, tl_topk_impl
def bench_tl_topk(seq_len: int, topk: int = 256, batch: int = 1, iters: int = 50, warmup: int = 5):
torch.cuda.synchronize()
@createthis
createthis / dump_indexer_tilelang.py
Created November 12, 2025 18:50
dump_indexer_tilelang.py
#!/usr/bin/env python3
import argparse
import torch
import os
import sys
from typing import Optional
# Optional TVM runtime import to dump CUDA/PTX sources
import tilelang
from tilelang import tvm
@createthis
createthis / mqa_attn_return_logits_kernel.cu
Created November 12, 2025 18:52
mqa_attn_return_logits_kernel.cu
#include <tl_templates/cuda/cuda_fp8.h>
#include <tl_templates/cuda/gemm.h>
#include <tl_templates/cuda/copy.h>
#include <tl_templates/cuda/reduce.h>
#include <tl_templates/cuda/ldsm.h>
#include <tl_templates/cuda/threadblock_swizzle.h>
#include <tl_templates/cuda/debug.h>
#ifdef ENABLE_BF16
#include <tl_templates/cuda/cuda_bf16_fallbacks.cuh>
#endif
@createthis
createthis / DeepSeek_V3_2.md
Created November 19, 2025 19:43
DeepSeek_V3_2 pdf converted to markdown using DeepSeek OCR

DeepSeek-V3.2-Exp: Boosting Long-Context Efficiency with DeepSeek Sparse Attention

DeepSeek- AI

research@deepseek.com

Abstract

We introduce DeepSeek- V3.2- Exp, an experimental sparse- attention model, which equips DeepSeek- V3.1- Terminus with DeepSeek Sparse Attention (DSA) through continued training. With DSA, a fine- grained sparse attention mechanism powered by a lightning indexer, DeepSeek- V3.2- Exp achieves significant efficiency improvements in both training and inference, especially in long- context scenarios. The model checkpoints are available at https://huggingface.co/deepseek- ai/DeepSeek- V3.2- Exp.

@createthis
createthis / chat_template.jinja
Created December 29, 2025 17:39
DeepSeek 3.2 chat template
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% if not thinking is defined %}{% set thinking = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, system_prompt='', is_first_sp=true, is_last_user=false, is_only_sys=false, is_prefix=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{% set ns.is_only_sys = true %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- if ns.i