Skip to content

Instantly share code, notes, and snippets.

This file has been truncated, but you can view the full file.
// -----// IR Dump After CheckVHLOStableHloMixUsage (iree-check-vhlostablehlo-mix-usage) //----- //
module {
func.func @sort3D() {
%0 = util.unfoldable_constant dense<[[[1, 2, 3, 4], [4, 3, 2, 1]]]> : tensor<1x2x4xi32>
%1 = "stablehlo.sort"(%0) <{dimension = 2 : i64, is_stable = false}> ({
^bb0(%arg0: tensor<i32>, %arg1: tensor<i32>):
%2 = stablehlo.compare LT, %arg0, %arg1 : (tensor<i32>, tensor<i32>) -> tensor<i1>
stablehlo.return %2 : tensor<i1>
}) : (tensor<1x2x4xi32>) -> tensor<1x2x4xi32>
check.expect_eq_const(%1, dense<[[[1, 2, 3, 4], [1, 2, 3, 4]]]> : tensor<1x2x4xi32>) : tensor<1x2x4xi32>
@bjacob
bjacob / sort3D.rocmasm
Created March 24, 2025 16:04
sort3D asm
.amdgcn_target "amdgcn-amd-amdhsa--gfx942"
.amdhsa_code_object_version 5
.text
.globl _sort3D_dispatch_0_sort_1x2x4xi32_dispatch_tensor_store
.p2align 8
.type _sort3D_dispatch_0_sort_1x2x4xi32_dispatch_tensor_store,@function
_sort3D_dispatch_0_sort_1x2x4xi32_dispatch_tensor_store:
s_load_dwordx2 s[2:3], s[0:1], 0x0
s_waitcnt lgkmcnt(0)
s_branch .LBB0_0
diff --git a/tests/e2e/math/math_ops_llvm-cpu.json b/tests/e2e/math/math_ops_rocm.json
index 57e8c422a7..fc4a6c964a 100644
--- a/tests/e2e/math/math_ops_llvm-cpu.json
+++ b/tests/e2e/math/math_ops_rocm.json
@@ -8,8 +8,8 @@
{
"op": "acos",
"type": "f16",
- "atol": 1.0e-02,
- "rtol": 1.0e-02
::mlir::ParseResult ExpectAlmostEqConstOp::parse(::mlir::OpAsmParser &parser, ::mlir::OperationState &result) {
::llvm::SmallVector<::mlir::OpAsmParser::UnresolvedOperand, 4> deviceOperands;
::llvm::SMLoc deviceOperandsLoc;
(void)deviceOperandsLoc;
::mlir::OpAsmParser::UnresolvedOperand lhsRawOperand{};
::llvm::ArrayRef<::mlir::OpAsmParser::UnresolvedOperand> lhsOperands(&lhsRawOperand, 1); ::llvm::SMLoc lhsOperandsLoc;
(void)lhsOperandsLoc;
::mlir::ElementsAttr valueAttr;
::mlir::FloatAttr toleranceAttr;
::mlir::Type lhsRawType{};
// -----// IR Dump After CSE (cse) //----- //
func.func @__builtin_fill_i64(%arg0: !util.buffer, %arg1: !util.buffer, %arg2: !util.list<!util.buffer>, %arg3: i32, %arg4: i32, %arg5: i32, %arg6: i32, %arg7: i32, %arg8: i32, %arg9: i32, %arg10: i32, %arg11: i32) attributes {translation_info = #iree_codegen.translation_info<pipeline = VMVXDefault>} {
%c12 = arith.constant 12 : index
%c8 = arith.constant 8 : index
%c4 = arith.constant 4 : index
%c0 = arith.constant 0 : index
%c32_i64 = arith.constant 32 : i64
%c1 = arith.constant 1 : index
%buffer_size = util.buffer.size %arg1 : !util.buffer
%0 = util.buffer.load %arg1[%c0 for %c4] : !util.buffer{%buffer_size} -> i32
@bjacob
bjacob / README.md
Created March 4, 2025 20:04
Each hardware-specific microscaling format is a different quantization scheme

Each hardware-specific microscaling format is a different quantization scheme

This note is about what we can expect once workloads get optimized for microscaling.

Microscaling is about new hardware having new matrix-multiplications on small-bit-depth operands, plus separate scale factors. For instance, there is going to be a FP8 matrix multiplication instruction, accumulating in FP32, with additional "scale" FP32 operands applied as multipliers on the FP8 inputs just before multiply-accumulating them. There are also going to be new microscaling instructions for other small-bit-width floating-point and integer types.

Different hardware has always had different tile sizes. Different hardware has also supported different element types, but that mostly meant that other vendors caught up to the element types supported by each other. Once the same element type was supported, the differences in tile sizes were layout differ

 tools/iree-opt --iree-hal-conversion ~/b.mlir -debug 2>~/log.mlir

Why am I getting this error when building with this diff?

➜  iree-build ninja
[0/2] Re-checking globbed directories...
[181/317] Generating check_llvm-cpu_local-task_generic_success.mlir_module.vmfb from success.mlir
FAILED: runtime/src/iree/modules/check/test/check_llvm-cpu_local-task_generic_success.mlir_module.vmfb /home/benjacob/iree-build/runtime/src/iree/modules/check/test/check_llvm-cpu_local-task_generic_success.mlir_module.vmfb 
cd /home/benjacob/iree-build/runtime/src/iree/modules/check/test && /home/benjacob/iree-build/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=generic /home/benjacob/iree/runtime/src/iree/modules/check/test/success.mlir -o check_llvm-cpu_local-task_generic_success.mlir_module.vmfb --iree-hal-executable-object-search-path=\"/home/benjacob/iree-build\" --iree-llvmcpu-embedded-linker-path=\"/home/benjacob/iree-build/llvm-project/bin/lld\" --iree-llvmcpu-wasm-linker-path=\"/home/b
diff --git a/home/benjacob/interm-good/compiled_punet_compiled_punet_linked_rocm_hsaco_fb.linked.ll b/home/benjacob/interm-bad/compiled_punet_compiled_punet_linked_rocm_hsaco_fb.linked.ll
index 3aa3582..d2834a5 100644
--- a/home/benjacob/interm-good/compiled_punet_compiled_punet_linked_rocm_hsaco_fb.linked.ll
+++ b/home/benjacob/interm-bad/compiled_punet_compiled_punet_linked_rocm_hsaco_fb.linked.ll
@@ -43173,65 +43173,53 @@ define amdgpu_kernel void @"main$async_dispatch_57_elementwise_2x4096x2560_f16xf
%49 = load <8 x half>, ptr addrspace(1) %48, align 2
%50 = load <1 x float>, ptr addrspace(1) %2, align 4
%51 = fdiv <8 x half> %47, %19
- %52 = fcmp olt <8 x half> %51, zeroinitializer
- %53 = fneg <8 x half> %51
// -----// IR Dump After CSE (cse) //----- //
func.func @_check_reordering_dispatch_0_generic_384_f32() {
%0 = ub.poison : vector<f32>
%cst = arith.constant dense<-0.000000e+00> : vector<1xf32>
%c4 = arith.constant 4 : index
%c384 = arith.constant 384 : index
%c0 = arith.constant 0 : index
%1 = hal.interface.binding.subspan layout(<bindings = [#hal.pipeline.binding<storage_buffer, ReadOnly>, #hal.pipeline.binding<storage_buffer, "ReadOnly|Indirect">, #hal.pipeline.binding<storage_buffer, Indirect>], flags = Indirect>) binding(0) alignment(64) offset(%c0) flags(ReadOnly) : memref<384xf32>
memref.assume_alignment %1, 64 : memref<384xf32>