Skip to content

Instantly share code, notes, and snippets.

@apollo-mg
Last active March 10, 2026 03:07
Show Gist options
  • Select an option

  • Save apollo-mg/d44cb753962fa9f6e1e45a7101f14284 to your computer and use it in GitHub Desktop.

Select an option

Save apollo-mg/d44cb753962fa9f6e1e45a7101f14284 to your computer and use it in GitHub Desktop.
The Definitive RDNA 4 (GFX1201) AI Capability & Ecosystem Master List - March 2026

πŸ›Έ RDNA 4 (GFX1201) AI MASTER LIST

Last Updated: March 9, 2026 | Environment: ROCm 7.2 / Poachers Special Ed (PyTorch 2.9.1 / Triton 3.5.1)

🟒 1. THE "GREEN ZONE" (Verified Working Bare-Metal)

  • Flash Linear Attention (FLA): ASCENDED. Liberated from Docker; running bare-metal via Triton kernels.
  • 4-bit Resident Vision: CONFIRMED. Qwen 3.5 4B running in 4.7GB VRAM with ~27-40s prefill.
  • Dual-Core Residency: VERIFIED. Logic (DeepSeek-R1 14B @ 51 tok/s) and Vision (Qwen 3.5 4B) running simultaneously in 16GB VRAM.
  • Triton 3.5.1 + PyTorch 2.9.1: Stable native pairing for GFX1201.
  • Unsloth 4-bit Native: Works perfectly once vLLM/CUDA dependency checks are bypassed.

🟑 2. THE "YELLOW ZONE" (Functional Workarounds)

  • Prefill Latency: Currently 27-40s. Bottleneck identified in Triton prefill kernels; target is <5s.
  • FP8 Hardware Status: RESEARCHED. GFX1201 supports float8_e4m3fnuz natively, but Triton 3.5.1 lacks intrinsic legalization. 10x slowdown due to software emulation.
  • Frankenstein Build: Setup uses 24.04 container libraries on 22.04 host. OS migration to 24.04 planned.

πŸ”΄ 3. THE "RED ZONE" (Confirmed Broken)

  • Native FP8 MatMul: PyTorch addmm and Triton kernels currently fail legalization/intrinsic mapping for GFX1201.
  • Native pip install causal-conv1d: Still blocked by hardcoded NVIDIA/NVCC checks.
  • vLLM Native Linking: ABI drift in PyTorch Nightly breaks binary extension loading (getCurrentHIPStream error).

πŸ—οΈ BUILD REPORT: "POACHERS SPECIAL ED"

Methodology: Sovereign Extraction & Infiltration

  1. Liberated optimized RDNA 4 wheels (Torch/Triton/Apex) from rocm/vllm-dev:rocm7.2_navi.
  2. Poached internal Triton kernels (fla, causal_conv1d) directly from container source.
  3. Engineered local shims to strip vllm and cuda dependencies.
  4. Nuclear Patch applied to Unsloth to ignore hardware gatekeeping.

Current Verdict: The RX 9070 XT is a fully-functional, resident-capable AI workstation for Logic (14B) + Vision (4B) workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment