Skip to content

Instantly share code, notes, and snippets.

➜ iree-build ninja iree-opt && tools/iree-opt '--pass-pipeline=builtin.module(func.func(iree-codegen-materialize-device-encoding))' /tmp/xx/iree/compiler/plugins/target/ROCM/test/materialize_encoding_ukernel_gfx942.mlir
[0/2] Re-checking globbed directories...
[4/4] Linking CXX executable tools/iree-opt
=================================================================
==118841==ERROR: AddressSanitizer: heap-use-after-free on address 0x7cd8cc303aec at pc 0x7ff8e297eb6d bp 0x7bf8c725b8f0 sp 0x7bf8c725b8e8
READ of size 4 at 0x7cd8cc303aec thread T5
#0 0x7ff8e297eb6c in mlir::Operation::getPropertiesStorageSize() const /tmp/xx/iree/third_party/llvm-project/mlir/include/mlir/IR/Operation.h:897:18
#1 0x7ff8e297eb6c in mlir::Operation::getAttr(llvm::StringRef) /tmp/xx/iree/third_party/llvm-project/mlir/include/mlir/IR/Operation.h:542:9
#2 0x7ff8e297eb6c in mlir::iree_compiler::IREE::ROCM::TensorUKernelProviderAttr::getDataLayoutForUKernel(mlir::Attribute, mlir::DictionaryAttr) const /tmp/xx/iree/compil
(gdb) thread apply all bt
Thread 7 (Thread 0x7ffff520a6c0 (LWP 1109500) "llvm-worker-5"):
#0 0x00007ffff7aafd71 in __futex_abstimed_wait_common64 (private=32767, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555561966fc4) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=32767, abstime=0x0, clockid=0, expected=0, futex_word=0x555561966fc4) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x555561966fc4, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007ffff7ab27ed in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555561966f70, cond=0x555561966f98) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x555561966f98, mutex=0x555561966f70) at ./nptl/pthread_cond_wait.c:627
#5 0x00005555613da316 in llvm::StdThreadPool::processTasks(llvm::ThreadPoolTaskGroup*) ()
#6 0x00005555613da777 in voi
(gdb) thread apply all bt
Thread 7 (Thread 0x7fffc37fe6c0 (LWP 1644017) "llvm-worker-5"):
#0 0x00007fffd2825d71 in __futex_abstimed_wait_common64 (private=32767, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555f75a20) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=32767, abstime=0x0, clockid=0, expected=0, futex_word=0x555555f75a20) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x555555f75a20, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007fffd28287ed in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555555f759d0, cond=0x555555f759f8) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x555555f759f8, mutex=0x555555f759d0) at ./nptl/pthread_cond_wait.c:627
#5 0x00007fffe35b15cb in std::condition_variable::wait<llvm::StdThreadPool::processTasks(llvm::ThreadPoolTaskGroup*)::$
diff --git a/runtime/src/iree/vm/native_module_packing.h b/runtime/src/iree/vm/native_module_packing.h
index 705b68528a..be0ee5047e 100644
--- a/runtime/src/iree/vm/native_module_packing.h
+++ b/runtime/src/iree/vm/native_module_packing.h
@@ -8,6 +8,7 @@
#define IREE_VM_MODULE_ABI_PACKING_H_
#include <memory>
+#include <numeric>
#include <tuple>
(.venv) ➜ iree-build ninja iree-test-deps
[0/2] Re-checking globbed directories...
[6846/8151] Generating standalone_plugin_riscv_64.o
clang-22: warning: argument unused during compilation: '-fno-plt' [-Wunused-command-line-argument]
[8151/8151] Generating /tmp/xx/iree-build/tests/e2e/matmul/e2e_matmul_cdna4_mxfp4_llama_rocm_hip_matmul.vmfb from e2e_matmul_cdna4_mxfp4_llama_rocm_hip_matmul.mlir
FAILED: tests/e2e/matmul/e2e_matmul_cdna4_mxfp4_llama_rocm_hip_matmul.vmfb /tmp/xx/iree-build/tests/e2e/matmul/e2e_matmul_cdna4_mxfp4_llama_rocm_hip_matmul.vmfb
cd /tmp/xx/iree-build/tests/e2e/matmul && /tmp/xx/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-hip-target=gfx950 /tmp/xx/iree-build/tests/e2e/matmul/e2e_matmul_cdna4_mxfp4_llama_rocm_hip_matmul.mlir -o /tmp/xx/iree-build/tests/e2e/matmul/e2e_matmul_cdna4_mxfp4_llama_rocm_hip_matmul.vmfb --iree-hal-executable-object-search-path=\"/tmp/xx/iree-build\"
error: <unknown>:0:0: s
hal.executable public @prefill_bs4$async_dispatch_22 {
hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {abi = "hip", iree.encoding.resolver = #iree_gpu.gpu_encoding_resolver<>, iree_codegen.target_info = #iree_gpu.target<arch = "gfx950", features = "", wgp = <compute = fp64|fp32|fp16|int64|int32|int16|int8, storage = b64|b32|b16|b8, subgroup = shuffle|arithmetic, dot = dp4xi8toi32, mma = [<MFMA_F32_16x16x32_F16>, <MFMA_F32_32x32x16_F16>, <MFMA_F32_16x16x32_BF16>, <MFMA_F32_32x32x16_BF16>, <MFMA_F32_16x16x128_F8E5M2>, <MFMA_F32_16x16x128_F8E5M2_F8E4M3FN>, <MFMA_F32_16x16x128_F8E4M3FN>, <MFMA_F32_16x16x128_F8E4M3FN_F8E5M2>, <MFMA_F32_32x32x64_F8E5M2>, <MFMA_F32_32x32x64_F8E5M2_F8E4M3FN>, <MFMA_F32_32x32x64_F8E4M3FN>, <MFMA_F32_32x32x64_F8E4M3FN_F8E5M2>, <MFMA_I32_16x16x64_I8>, <MFMA_I32_32x32x32_I8>, <MFMA_F32_16x16x16_BF16>, <MFMA_F32_32x32x8_BF16>, <MFMA_F32_16x16x32_F8E5M2>, <MFMA_F32_16x16x32_F8E5M2_F8E4M3FN>, <MFMA_F32_16x16x32_F8E4M3FN>, <MFMA_F32_16x16x32_F8E4M3FN_F8E5
@bjacob
bjacob / report.txt
Created November 26, 2025 15:26
PC-sampling profile of Llama 405b FP4 prefill on MI350
Data from command:
/tmp/xx/iree-build/tools/iree-benchmark-module --device=hip://0 --device_allocator=caching --hip_use_streams=true --module=/home/ossci/iree-model-benchmark/llama3/tmp/base.405b_fp4.vmfb --parameters=model=/tmp/fp4_preshuffled_2025_09_12.irpa --function=prefill_bs4 --input=@/tmp/args_bs4_2500/prefill_input0_tokens.npy --input=@/tmp/args_bs4_2500/prefill_input1_seq_lens.npy --input=@/tmp/args_bs4_2500/prefill_input2_seq_block_ids.npy --input=@/tmp/args_bs4_2500/prefill_input3_kv_cache_state.npy --benchmark_repetitions=1
Took 41.44 seconds
5330711 samples collected
+-----+----------------------------------------------------+---------------+-----------------+----------------+------------+
| | kernel | duration[s] | % of gpu time | % of samples | selected |
|-----+----------------------------------------------------+---------------+-----------------+----------------+------------|
| 0 | prefill_bs4$async_dispatch_22_reduction_Dx53248x51
@bjacob
bjacob / a.txt
Created November 18, 2025 15:24
---------------------------- live log sessionstart -----------------------------
INFO conftest:conftest.py:39 Pytest benchmark test session is starting
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.0.0, pluggy-1.6.0
rootdir: /home/ossci/iree-test-suites/sharktank_models
configfile: pytest.ini
plugins: anyio-4.11.0, xdist-3.5.0, timeout-2.4.0, subtests-0.15.0, metadata-3.1.1, cov-7.0.0, asyncio-0.23.8, html-4.1.1, retry-1.7.0, reportlog-1.0.0, check-2.6.0
timeout: 600.0s
timeout method: signal
(.venv) ➜ iree-test-suites git:(main) pytest \
-rA \
-m "target_cpu" \
--timeout=300 \
--durations=0 \
--log-cli-level=info
============================================================================================= test session starts =============================================================================================
platform linux -- Python 3.12.3, pytest-8.0.0, pluggy-1.6.0
rootdir: /home/ossci/iree-test-suites
plugins: anyio-4.11.0, xdist-3.5.0, timeout-2.4.0, subtests-0.15.0, metadata-3.1.1, cov-7.0.0, asyncio-0.23.8, html-4.1.1, retry-1.7.0, reportlog-1.0.0, check-2.6.0
@bjacob
bjacob / gist:1b89f90cd8dc5ec3d920a88603667871
Created October 14, 2025 14:42
pipeline_tile_and_fuse.mlir.test failure
➜ iree-build ctest -R iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir.test --output-on-failure
Test project /home/benjacob/iree-build
Start 436: iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir.test
1/1 Test #436: iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir.test ...***Failed 0.68 sec
-- Testing: 1 tests, 1 workers --
FAIL: IREE :: src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir (1 of 1)
******************** TEST 'IREE :: src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_tile_and_fuse.mlir' FAILED ********************
Exit Code: 1
Command Output (stderr):