Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save AmosLewis/212cc11c98433a04ae7d574b669fd344 to your computer and use it in GitHub Desktop.
Save AmosLewis/212cc11c98433a04ae7d574b669fd344 to your computer and use it in GitHub Desktop.
(.venv) ➜ shark-ai git:(ci-new) ✗ /sharedfile/attn/bisect/export_run_f8_8b_tp1.sh
No flag provided. Using default iree_day 0624.
No flag provided. Using default shark_day 0625.
/sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.mlir
/sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.json
/sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.prefill.vmfb
/sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.prefill.txt
File created: /sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.prefill.txt
/sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.decode.txt
File created: /sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.decode.txt
/home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/turbine/aot/params.py:163: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
return torch.from_numpy(wrapper)
Exporting prefill_bs4
/home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/torch/_export/non_strict_utils.py:520: UserWarning: Tensor.T is deprecated on 0-D tensors. This function is the identity in these cases. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3691.)
return func(*args, **kwargs)
Exporting decode_bs4
GENERATED!
Exporting
Saving to '/sharedfile/attn/128/out/fp8_attn_iree0624.shark0625.mlir'
iree-compile prefill:
iree-benchmark-module prefill:
2025-06-26T00:50:08-07:00
Running /home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/_runtime_libs/iree-benchmark-module
Run on (96 X 3810.79 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x96)
L1 Instruction 32 KiB (x96)
L2 Unified 1024 KiB (x96)
L3 Unified 32768 KiB (x16)
Load Average: 3.34, 3.44, 4.07
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
BM_prefill_bs4/process_time/real_time 26.0 ms 34.7 ms 27 items_per_second=38.4154/s
BM_prefill_bs4/process_time/real_time 26.0 ms 33.7 ms 27 items_per_second=38.4211/s
BM_prefill_bs4/process_time/real_time 26.0 ms 33.5 ms 27 items_per_second=38.4211/s
BM_prefill_bs4/process_time/real_time 25.9 ms 33.0 ms 27 items_per_second=38.5715/s
BM_prefill_bs4/process_time/real_time 25.9 ms 32.0 ms 27 items_per_second=38.5609/s
BM_prefill_bs4/process_time/real_time 26.0 ms 34.8 ms 27 items_per_second=38.5081/s
BM_prefill_bs4/process_time/real_time 26.0 ms 34.4 ms 27 items_per_second=38.4985/s
BM_prefill_bs4/process_time/real_time 26.0 ms 33.4 ms 27 items_per_second=38.4589/s
BM_prefill_bs4/process_time/real_time 26.0 ms 33.2 ms 27 items_per_second=38.4172/s
BM_prefill_bs4/process_time/real_time 26.0 ms 32.9 ms 27 items_per_second=38.4721/s
BM_prefill_bs4/process_time/real_time_mean 26.0 ms 33.6 ms 10 items_per_second=38.4745/s
BM_prefill_bs4/process_time/real_time_median 26.0 ms 33.4 ms 10 items_per_second=38.4655/s
BM_prefill_bs4/process_time/real_time_stddev 0.040 ms 0.878 ms 10 items_per_second=0.058964/s
BM_prefill_bs4/process_time/real_time_cv 0.15 % 2.62 % 10 items_per_second=0.15%
iree-compile decode:
iree-benchmark-module decode:
2025-06-26T00:50:39-07:00
Running /home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/_runtime_libs/iree-benchmark-module
Run on (96 X 3810.79 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x96)
L1 Instruction 32 KiB (x96)
L2 Unified 1024 KiB (x96)
L3 Unified 32768 KiB (x16)
Load Average: 4.13, 3.62, 4.11
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_decode_bs4/process_time/real_time 6.71 ms 8.18 ms 108 items_per_second=148.945/s
BM_decode_bs4/process_time/real_time 6.74 ms 8.20 ms 108 items_per_second=148.454/s
BM_decode_bs4/process_time/real_time 6.74 ms 8.22 ms 108 items_per_second=148.421/s
BM_decode_bs4/process_time/real_time 6.79 ms 8.25 ms 108 items_per_second=147.269/s
BM_decode_bs4/process_time/real_time 6.79 ms 8.27 ms 108 items_per_second=147.212/s
BM_decode_bs4/process_time/real_time 6.79 ms 8.26 ms 108 items_per_second=147.182/s
BM_decode_bs4/process_time/real_time 6.80 ms 8.27 ms 108 items_per_second=147.12/s
BM_decode_bs4/process_time/real_time 6.80 ms 8.27 ms 108 items_per_second=147.133/s
BM_decode_bs4/process_time/real_time 6.80 ms 8.29 ms 108 items_per_second=147.047/s
BM_decode_bs4/process_time/real_time 6.80 ms 8.28 ms 108 items_per_second=147.09/s
BM_decode_bs4/process_time/real_time_mean 6.78 ms 8.25 ms 10 items_per_second=147.587/s
BM_decode_bs4/process_time/real_time_median 6.79 ms 8.27 ms 10 items_per_second=147.197/s
BM_decode_bs4/process_time/real_time_stddev 0.033 ms 0.036 ms 10 items_per_second=0.719438/s
BM_decode_bs4/process_time/real_time_cv 0.49 % 0.43 % 10 items_per_second=0.49%
/sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.mlir
/sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.json
/sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.prefill.vmfb
/sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.prefill.txt
File created: /sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.prefill.txt
/sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.decode.txt
File created: /sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.decode.txt
/home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/turbine/aot/params.py:163: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
return torch.from_numpy(wrapper)
Exporting prefill_bs4
/home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/torch/_export/non_strict_utils.py:520: UserWarning: Tensor.T is deprecated on 0-D tensors. This function is the identity in these cases. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3691.)
return func(*args, **kwargs)
Exporting decode_bs4
GENERATED!
Exporting
Saving to '/sharedfile/attn/2048/out/fp8_attn_iree0624.shark0625.mlir'
iree-compile prefill:
iree-benchmark-module prefill:
2025-06-26T00:53:18-07:00
Running /home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/_runtime_libs/iree-benchmark-module
Run on (96 X 3810.79 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x96)
L1 Instruction 32 KiB (x96)
L2 Unified 1024 KiB (x96)
L3 Unified 32768 KiB (x16)
Load Average: 3.39, 3.51, 3.99
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
BM_prefill_bs4/process_time/real_time 275 ms 335 ms 3 items_per_second=3.63362/s
BM_prefill_bs4/process_time/real_time 276 ms 277 ms 3 items_per_second=3.62463/s
BM_prefill_bs4/process_time/real_time 276 ms 365 ms 3 items_per_second=3.62278/s
BM_prefill_bs4/process_time/real_time 277 ms 333 ms 3 items_per_second=3.61569/s
BM_prefill_bs4/process_time/real_time 276 ms 379 ms 3 items_per_second=3.62226/s
BM_prefill_bs4/process_time/real_time 277 ms 349 ms 3 items_per_second=3.61017/s
BM_prefill_bs4/process_time/real_time 278 ms 396 ms 3 items_per_second=3.6021/s
BM_prefill_bs4/process_time/real_time 277 ms 362 ms 3 items_per_second=3.61316/s
BM_prefill_bs4/process_time/real_time 277 ms 406 ms 3 items_per_second=3.61099/s
BM_prefill_bs4/process_time/real_time 278 ms 363 ms 3 items_per_second=3.59128/s
BM_prefill_bs4/process_time/real_time_mean 277 ms 357 ms 10 items_per_second=3.61467/s
BM_prefill_bs4/process_time/real_time_median 277 ms 363 ms 10 items_per_second=3.61442/s
BM_prefill_bs4/process_time/real_time_stddev 0.929 ms 36.6 ms 10 items_per_second=0.0121278/s
BM_prefill_bs4/process_time/real_time_cv 0.34 % 10.27 % 10 items_per_second=0.34%
iree-compile decode:
iree-benchmark-module decode:
2025-06-26T00:53:50-07:00
Running /home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/_runtime_libs/iree-benchmark-module
Run on (96 X 3810.79 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x96)
L1 Instruction 32 KiB (x96)
L2 Unified 1024 KiB (x96)
L3 Unified 32768 KiB (x16)
Load Average: 3.77, 3.59, 4.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------------------
BM_decode_bs4/process_time/real_time 10.0 ms 12.3 ms 70 items_per_second=99.6063/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.4 ms 70 items_per_second=99.489/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.3 ms 70 items_per_second=99.4821/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.4 ms 70 items_per_second=99.4848/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.4 ms 70 items_per_second=98.9208/s
BM_decode_bs4/process_time/real_time 10.2 ms 12.4 ms 70 items_per_second=98.5113/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.4 ms 70 items_per_second=98.7425/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.4 ms 70 items_per_second=98.6015/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.5 ms 70 items_per_second=99.1604/s
BM_decode_bs4/process_time/real_time 10.1 ms 12.4 ms 70 items_per_second=99.1219/s
BM_decode_bs4/process_time/real_time_mean 10.1 ms 12.4 ms 10 items_per_second=99.1121/s
BM_decode_bs4/process_time/real_time_median 10.1 ms 12.4 ms 10 items_per_second=99.1411/s
BM_decode_bs4/process_time/real_time_stddev 0.041 ms 0.044 ms 10 items_per_second=0.402705/s
BM_decode_bs4/process_time/real_time_cv 0.41 % 0.35 % 10 items_per_second=0.41%
(.venv) ➜ shark-ai git:(main) ✗
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment