Skip to content

Instantly share code, notes, and snippets.

@AmosLewis
Last active February 11, 2025 18:35
Show Gist options
  • Save AmosLewis/e59aa617657951e6959afd5ec414f33b to your computer and use it in GitHub Desktop.
Save AmosLewis/e59aa617657951e6959afd5ec414f33b to your computer and use it in GitHub Desktop.
ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
iree-benchmark-module \
--hip_use_streams=true \
--module=/sharedfile/2048/fp8_2048.vmfb \
--parameters=model=/sharedfile/llama3_8b_fp8.irpa \
--device=hip://4 \
--function=prefill_bs4 \
--input=4x2048xi64=@/sharedfile/2048/prefill/prefill_token_ids_4x2048xi64.bin \
--input=4xi64=@/sharedfile/2048/prefill/prefill_seq_lens_4xi64.bin \
--input=4x64xi64=@/sharedfile/2048/prefill/prefill_seq_block_ids_4x64xi64.bin \
--input=261x2097152xf8E4M3FNUZ=@/sharedfile/2048/prefill/prefill_cache_state_261x2097152xf8E4M3FNUZ.bin \
--benchmark_repetitions=3
# 2025-02-10T18:56:57-08:00
# Running /home/chi/src/shark-ai/.venv/lib/python3.11/site-packages/iree/_runtime_libs/iree-benchmark-module
# Run on (96 X 3810.79 MHz CPU s)
# CPU Caches:
# L1 Data 32 KiB (x96)
# L1 Instruction 32 KiB (x96)
# L2 Unified 1024 KiB (x96)
# L3 Unified 32768 KiB (x16)
# Load Average: 1.13, 1.23, 3.41
# ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
# -------------------------------------------------------------------------------------------------------
# Benchmark Time CPU Iterations UserCounters...
# -------------------------------------------------------------------------------------------------------
# BM_prefill_bs4/process_time/real_time 725 ms 725 ms 1 items_per_second=1.37975/s
# BM_prefill_bs4/process_time/real_time 727 ms 728 ms 1 items_per_second=1.3762/s
# BM_prefill_bs4/process_time/real_time 727 ms 728 ms 1 items_per_second=1.37512/s
# BM_prefill_bs4/process_time/real_time_mean 726 ms 727 ms 3 items_per_second=1.37703/s
# BM_prefill_bs4/process_time/real_time_median 727 ms 728 ms 3 items_per_second=1.3762/s
# BM_prefill_bs4/process_time/real_time_stddev 1.28 ms 1.41 ms 3 items_per_second=2.42255m/s
# BM_prefill_bs4/process_time/real_time_cv 0.18 % 0.19 % 3 items_per_second=0.18%
@AmosLewis
Copy link
Author

ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
iree-benchmark-module \
--hip_use_streams=true \
--module=/sharedfile/128/fp8_128.vmfb \
--parameters=model=/sharedfile/llama3_8b_fp8.irpa \
--device=hip://4 \
--function=prefill_bs4 \
--input=4x128xi64=@/sharedfile/128/prefill/prefill_token_ids_4x128xi64.bin \
--input=4xi64=@/sharedfile/128/prefill/prefill_seq_lens_4xi64.bin \
--input=4x64xi64=@/sharedfile/128/prefill/prefill_seq_block_ids_4x64xi64.bin \
--input=261x2097152xf8E4M3FNUZ=@/sharedfile/128/prefill/prefill_cache_state_261x2097152xf8E4M3FNUZ.bin \
--benchmark_repetitions=3
2025-02-11T10:26:13-08:00
Running /home/chi/src/shark-ai/.venv/lib/python3.11/site-packages/iree/_runtime_libs/iree-benchmark-module
Run on (96 X 3810.79 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x96)
  L1 Instruction 32 KiB (x96)
  L2 Unified 1024 KiB (x96)
  L3 Unified 32768 KiB (x16)
Load Average: 3.68, 2.49, 1.34
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
BM_prefill_bs4/process_time/real_time               713 ms          713 ms            1 items_per_second=1.40236/s
BM_prefill_bs4/process_time/real_time               713 ms          714 ms            1 items_per_second=1.4018/s
BM_prefill_bs4/process_time/real_time               714 ms          714 ms            1 items_per_second=1.39994/s
BM_prefill_bs4/process_time/real_time_mean          714 ms          714 ms            3 items_per_second=1.40136/s
BM_prefill_bs4/process_time/real_time_median        713 ms          714 ms            3 items_per_second=1.4018/s
BM_prefill_bs4/process_time/real_time_stddev      0.646 ms        0.592 ms            3 items_per_second=1.26808m/s
BM_prefill_bs4/process_time/real_time_cv           0.09 %          0.08 %             3 items_per_second=0.09%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment