Skip to content

Instantly share code, notes, and snippets.

View IsaacYQH's full-sized avatar
🚄
working hard!

Qihui Yang IsaacYQH

🚄
working hard!
View GitHub Profile
@Birch-san
Birch-san / _06_fused_attention_blockptr_jvp.py
Last active August 11, 2025 08:35
Triton fused attention tutorial, updated with JVP support. Albeit with atol=1e-3 accuracy on JVP.
from __future__ import annotations
"""
Fused Attention
===============
This is a Triton implementation of the Flash Attention v2 algorithm from Tri Dao (https://tridao.me/publications/flash2/flash2.pdf)
Credits: OpenAI kernel team
Extra Credits:
import gc
from typing import Tuple
import torch
import torch.nn.functional as F
import triton
import triton.language as tl
import triton.testing
from kernels import get_kernel