Last active
March 4, 2025 17:35
-
-
Save AmosLewis/5cdc024ac87355ae44d486142d4d96e0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/home/chi/src/iree-build-trace/tools/iree-compile \ | |
/sharedfile/attn/128/fp8_attn.mlir \ | |
--iree-hip-target=gfx942 \ | |
-o=/sharedfile/attn/128/fp8_attn.vmfb \ | |
--iree-hal-target-device=hip \ | |
--iree-dispatch-creation-enable-aggressive-fusion=true \ | |
--iree-global-opt-propagate-transposes=true \ | |
--iree-opt-aggressively-propagate-transposes=true \ | |
--iree-opt-data-tiling=false \ | |
--iree-preprocessing-pass-pipeline='builtin.module(util.func(iree-preprocessing-generalize-linalg-matmul-experimental))' \ | |
--iree-hal-indirect-command-buffers=true \ | |
--iree-stream-resource-memory-model=discrete \ | |
--iree-hal-memoization=true \ | |
--iree-opt-strip-assertions | |
ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ | |
iree-benchmark-module \ | |
--hip_use_streams=true \ | |
--module=/sharedfile/attn/128/fp8_attn.vmfb \ | |
--parameters=model=/sharedfile/attn/fp8_attn.irpa \ | |
--device=hip://4 \ | |
--function=prefill_bs4 \ | |
--input=4x128xi64=@/sharedfile/128/prefill/prefill_token_ids_4x128xi64.bin \ | |
--input=4xi64=@/sharedfile/128/prefill/prefill_seq_lens_4xi64.bin \ | |
--input=4x4xi64=@/sharedfile/128/prefill/prefill_seq_block_ids_4x4xi64.bin \ | |
--input=261x2097152xf8E4M3FNUZ=@/sharedfile/128/prefill/prefill_cache_state_261x2097152xf8E4M3FNUZ.bin \ | |
--benchmark_repetitions=3 | |
# [Codegen] Block dyn dims of parallel linalg.generic ops #20091 | |
# + | |
# BLOCK + | |
# commit 747c06e68160562e7190ce1c7bf5fe774414b35e (HEAD) | |
# Author: Ian Wood <[email protected]> | |
# Date: Wed Feb 26 10:22:25 2025 -0800 | |
# Don't hoist sequence-like ops (#20106) | |
# 2025-03-03T11:48:15-08:00 | |
# Running /home/chi/src/iree-build/tools/iree-benchmark-module | |
# Run on (96 X 3810.79 MHz CPU s) | |
# CPU Caches: | |
# L1 Data 32 KiB (x96) | |
# L1 Instruction 32 KiB (x96) | |
# L2 Unified 1024 KiB (x96) | |
# L3 Unified 32768 KiB (x16) | |
# Load Average: 14.30, 10.88, 8.43 | |
# ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. | |
# ***WARNING*** Library was built as DEBUG. Timings may be affected. | |
# ------------------------------------------------------------------------------------------------------- | |
# Benchmark Time CPU Iterations UserCounters... | |
# ------------------------------------------------------------------------------------------------------- | |
# BM_prefill_bs4/process_time/real_time 98.9 ms 103 ms 7 items_per_second=10.1145/s | |
# BM_prefill_bs4/process_time/real_time 99.3 ms 104 ms 7 items_per_second=10.0698/s | |
# BM_prefill_bs4/process_time/real_time 99.5 ms 105 ms 7 items_per_second=10.0548/s | |
# BM_prefill_bs4/process_time/real_time_mean 99.2 ms 104 ms 3 items_per_second=10.0797/s | |
# BM_prefill_bs4/process_time/real_time_median 99.3 ms 104 ms 3 items_per_second=10.0698/s | |
# BM_prefill_bs4/process_time/real_time_stddev 0.305 ms 0.808 ms 3 items_per_second=0.0310233/s | |
# BM_prefill_bs4/process_time/real_time_cv 0.31 % 0.78 % 3 items_per_second=0.31% | |
# [Codegen] Block dyn dims of parallel linalg.generic ops #20091 | |
# + | |
# commit 1aff06df0a70b454fea33278bee00705291cdadc (HEAD) | |
# Author: Zhuoran Yin <[email protected]> | |
# Date: Wed Feb 26 09:17:11 2025 -0500 | |
# [codegen][gpu] Adding conv filter layout fhwc to preprocessing pipeline (#19974) | |
# 2025-03-03T11:56:26-08:00 | |
# Running /home/chi/src/iree-build/tools/iree-benchmark-module | |
# Run on (96 X 3810.79 MHz CPU s) | |
# CPU Caches: | |
# L1 Data 32 KiB (x96) | |
# L1 Instruction 32 KiB (x96) | |
# L2 Unified 1024 KiB (x96) | |
# L3 Unified 32768 KiB (x16) | |
# Load Average: 4.28, 5.18, 6.54 | |
# ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. | |
# ***WARNING*** Library was built as DEBUG. Timings may be affected. | |
# ------------------------------------------------------------------------------------------------------- | |
# Benchmark Time CPU Iterations UserCounters... | |
# ------------------------------------------------------------------------------------------------------- | |
# BM_prefill_bs4/process_time/real_time 34.8 ms 38.1 ms 20 items_per_second=28.7463/s | |
# BM_prefill_bs4/process_time/real_time 34.9 ms 38.0 ms 20 items_per_second=28.6736/s | |
# BM_prefill_bs4/process_time/real_time 34.9 ms 37.2 ms 20 items_per_second=28.6206/s | |
# BM_prefill_bs4/process_time/real_time_mean 34.9 ms 37.8 ms 3 items_per_second=28.6802/s | |
# BM_prefill_bs4/process_time/real_time_median 34.9 ms 38.0 ms 3 items_per_second=28.6736/s | |
# BM_prefill_bs4/process_time/real_time_stddev 0.077 ms 0.474 ms 3 items_per_second=0.0630962/s | |
# BM_prefill_bs4/process_time/real_time_cv 0.22 % 1.26 % 3 items_per_second=0.22% |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment