The GFX1250→GFX942 cross-family ISA transpiler passes 18-19/20 tests. The remaining failure is the matmul_splitk test, where the inner loop exits prematurely and non-deterministically. This document details the exhaustive investigation.
The split-K matmul uses two kernels:
matmul_splitk_compute(236 GFX12 instructions): Each workgroup computes a partial matmul for a chunk of K. UsesblockIdx.yfor split index.
