Last active
February 11, 2025 18:35
-
-
Save AmosLewis/e59aa617657951e6959afd5ec414f33b to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ROCR_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ | |
iree-benchmark-module \ | |
--hip_use_streams=true \ | |
--module=/sharedfile/2048/fp8_2048.vmfb \ | |
--parameters=model=/sharedfile/llama3_8b_fp8.irpa \ | |
--device=hip://4 \ | |
--function=prefill_bs4 \ | |
--input=4x2048xi64=@/sharedfile/2048/prefill/prefill_token_ids_4x2048xi64.bin \ | |
--input=4xi64=@/sharedfile/2048/prefill/prefill_seq_lens_4xi64.bin \ | |
--input=4x64xi64=@/sharedfile/2048/prefill/prefill_seq_block_ids_4x64xi64.bin \ | |
--input=261x2097152xf8E4M3FNUZ=@/sharedfile/2048/prefill/prefill_cache_state_261x2097152xf8E4M3FNUZ.bin \ | |
--benchmark_repetitions=3 | |
# 2025-02-10T18:56:57-08:00 | |
# Running /home/chi/src/shark-ai/.venv/lib/python3.11/site-packages/iree/_runtime_libs/iree-benchmark-module | |
# Run on (96 X 3810.79 MHz CPU s) | |
# CPU Caches: | |
# L1 Data 32 KiB (x96) | |
# L1 Instruction 32 KiB (x96) | |
# L2 Unified 1024 KiB (x96) | |
# L3 Unified 32768 KiB (x16) | |
# Load Average: 1.13, 1.23, 3.41 | |
# ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. | |
# ------------------------------------------------------------------------------------------------------- | |
# Benchmark Time CPU Iterations UserCounters... | |
# ------------------------------------------------------------------------------------------------------- | |
# BM_prefill_bs4/process_time/real_time 725 ms 725 ms 1 items_per_second=1.37975/s | |
# BM_prefill_bs4/process_time/real_time 727 ms 728 ms 1 items_per_second=1.3762/s | |
# BM_prefill_bs4/process_time/real_time 727 ms 728 ms 1 items_per_second=1.37512/s | |
# BM_prefill_bs4/process_time/real_time_mean 726 ms 727 ms 3 items_per_second=1.37703/s | |
# BM_prefill_bs4/process_time/real_time_median 727 ms 728 ms 3 items_per_second=1.3762/s | |
# BM_prefill_bs4/process_time/real_time_stddev 1.28 ms 1.41 ms 3 items_per_second=2.42255m/s | |
# BM_prefill_bs4/process_time/real_time_cv 0.18 % 0.19 % 3 items_per_second=0.18% |
Author
AmosLewis
commented
Feb 11, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment