The interesting document is: DATA-TILING-CPU-VS-GPU.md.
A ZIP archive of all the ai-generated files (pass-specific MLIR logs) should be attached to this gist. EDIT: Unfortunately, gists no longer allow attaching ZIP files. Grrr. Available on request.
This is a snapshot of CPU vs GPU differences in data-tiling on a simple matmul example, taken on April 9, 2026. The IREE commit is 007b1ee.
To reproduce, run:
iree-compile \
matmul_bf16.mlir -o /tmp/matmul_bf16.cpu.vmfb \
--iree-hal-target-backends=llvm-cpu \
--iree-llvmcpu-target-cpu=znver5 \
--iree-dispatch-creation-data-tiling \
-mlir-disable-threading -mlir-print-ir-after-all -mlir-print-ir-before-all \
2>log-cpu.mlir
iree-compile \
matmul_bf16.mlir -o /tmp/matmul_bf16.gpu.vmfb \
--iree-hal-target-backends=rocm \
--iree-hip-target=gfx950 \
--iree-dispatch-creation-data-tiling \
-mlir-disable-threading -mlir-print-ir-after-all -mlir-print-ir-before-all \
2>log-gpu.mlir
Then use AI (I used Cursor's "auto" mode) with this prompt:
In /home/ossci/data-tiling, I have two MLIR before/after-all compilation logs: log-cpu.mlir, log-gpu.mlir. They were both generated by the commands in README.md. My goal is to document the differences and commonalities between CPU and GPU compilation specifically concerning data-tiling. In the log, look particularly for anything mentioning #iree_encoding. These "tensor encoding" attributes are the core of where data-tiling happens. Look also for any changes to the following kinds of ops: linalg.matmul, linalg.mmt4d, and inner_tiled. I want to to copy out of log-{cpu,gpu}.mlir any before/after pass dump where a relevant change occurs, into its own separate log file. Create a ai-generated/ subdirectory under data-tiling: all your output files should go to ai-generated. I want separate output files (still in the same dir) for each of CPU/GPU, for each pass making a significant difference, and separate before/after file for each. Moreover, I want you to number these files with increasing positive integers reflecting the chronology. If a pass only makes a significant change on one side (e.g. only CPU but not GPU) I still want dumps for both sides, if only to keep the numbering aligned. You should not need to rerun the iree-compile commands: you should only need to read log-{cpu,gpu}.mlir. E.g. if "FooPass" is the first pass making a significant change on CPU then create 4 files: log-1-before-foo-cpu.mlir, log-1-after-foo-cpu.mlir, log-1-before-foo-gpu.mlir, log-1-after-foo-gpu.mlir. Putting the number 1 first before the pass name helps ensure that alphanumeric sorting of files remains useful. Create an overview file, ai-generated/DATA-TILING-CPU-VS-GPU.md, giving a high-level overview while linking to some select .mlir files.