Skip to content

Instantly share code, notes, and snippets.

python -m sharktank.examples.paged_llm_v1 \
--irpa-file=/shark-dev/8b/instruct/weights/llama3.1_8b_instruct_fp16.irpa \
--tokenizer-config-json=/shark-dev/8b/instruct/tokenizer_config.json \
--prompt="Name the capital of the United States"
# /home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/turbine/aot/params.py:163: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
# return torch.from_numpy(wrapper)
# :: Prompt tokens:
# prompt_0:
python -m sharktank.examples.paged_llm_v1 \
--irpa-file=/shark-dev/70b/instruct/weights/llama3.1_70b_instruct_fp16.irpa \
--tokenizer-config-json=/shark-dev/70b/instruct/tokenizer_config.json \
--prompt="Name the capital of the United States."
# /home/chi/src/shark-ai/.venv/lib/python3.12/site-packages/iree/turbine/aot/params.py:163: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
# return torch.from_numpy(wrapper)
# :: Prompt tokens:
# prompt_0:
# b'Name the capital of the United States.'
(.venv) ➜ shark-ai git:(main) ✗
huggingface-cli login
_| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|
_| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|
_| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|
_| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|
A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
(.venv) ➜ shark-ai git:(ad958230) ✗ git checkout b58783029e0cc3e1890b22339d470c394a66dcb4
M requirements-iree-pinned.txt
Previous HEAD position was ad958230 Add ops.mean for SplitPrimitiveTensor (#1308)
HEAD is now at b5878302 Add perplexity calculation for Tensor and Pipeline parallized Llama models (#1279)
(.venv) ➜ shark-ai git:(b5878302) ✗ pytest sharktank/tests/models/llama/benchmark_amdgpu_test.py -v -s --run-nightly-llama-tests --iree-hip-target=gfx942 --iree-device=hip://4 -k testBenchmark405B_f16_TP8_Non_Decomposed_Input_Len_128
================================================= test session starts ==================================================
platform linux -- Python 3.12.9, pytest-8.0.0, pluggy-1.5.0 -- /home/chi/src/shark-ai/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.9', 'Platform': 'Linux-6.8.0-52-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.0.0', 'pluggy': '1.5.0'}, 'Plugins': {'timeout': '2.3.1', 'anyio': '4.9.0', 'metadata': '3.1.1', 'html': '4
(.venv) ➜ shark-ai git:(b5878302) ✗ git checkout 3a73dc3f233bec037275dc75841a9f6112a32a45
M requirements-iree-pinned.txt
Previous HEAD position was b5878302 Add perplexity calculation for Tensor and Pipeline parallized Llama models (#1279)
HEAD is now at 3a73dc3f [tuner][NFC] remove walk function over input module (#1307)
(.venv) ➜ shark-ai git:(3a73dc3f) ✗ pytest sharktank/tests/models/llama/benchmark_amdgpu_test.py -v -s --run-nightly-llama-tests --iree-hip-target=gfx942 --iree-device=hip://4 -k testBenchmark405B_f16_TP8_Non_Decomposed_Input_Len_128
================================================= test session starts ==================================================
platform linux -- Python 3.12.9, pytest-8.0.0, pluggy-1.5.0 -- /home/chi/src/shark-ai/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.9', 'Platform': 'Linux-6.8.0-52-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.0.0', 'pluggy': '1.5.0'}, 'Plugins': {'timeout': '2.3.1', 'anyio': '4.9.0', 'metadata': '3.1.
(bisect.venv) ➜ bisect python /home/chi/src/iree/build_tools/pkgci/bisect/bisect_packages.py \
--good-ref=9e494bccb5472c5cc8f0910cbc339db8a2ea9aa7 \
--bad-ref=b4e3694a31a92a814a7b127f288e5928705fb3c4 \
--test-script=/sharedfile/attn/bisect/issue1.sh
Welcome to bisect_packages.py!
------------------------------------------------------------------
--------- Configuration ------------------------------------------
------------------------------------------------------------------
# run on SharkMI300X
# cd /sharedfile/attn/bisect
#./export_run_build.sh
# # Check if a command-line argument is provided
if [ -z "$1" ]; then
iree_day="042411"
echo "No flag provided. Using default iree_day $iree_day."
else
iree_day="$1"
(.venv) ➜ shark-ai git:(chi/xfail_f16) ✗ pytest sharktank/tests/models/llama/benchmark_amdgpu_test.py -v -s -m "expensive" --iree-hip-target=gfx942 --iree-device=hip://4 -k testBenchmark8B_fp8_TP1_Non_Decomposed
======================================================================================= test session starts =======================================================================================
platform linux -- Python 3.12.9, pytest-8.0.0, pluggy-1.5.0 -- /home/chi/src/shark-ai/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.9', 'Platform': 'Linux-6.8.0-52-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.0.0', 'pluggy': '1.5.0'}, 'Plugins': {'timeout': '2.3.1', 'anyio': '4.9.0', 'metadata': '3.1.1', 'html': '4.1.1', 'asyncio': '0.23.8', 'xdist': '3.5.0'}}
rootdir: /home/chi/src/shark-ai/sharktank
configfile: pyproject.toml
plugins: timeout-2.3.1, anyio-4.9.0, metadata-3.1.1, html-4.1.1, asyncio-0.23.8, xdist-3.5.0
asyncio: mode=Mode.STRICT
collected 13 items / 12 desel
# ssh chi@SharkMi300X
# iree-3.4.0rc20250327
# build iree with tracy
git checkout iree-3.4.0rc20250327
cmake -G Ninja -B ../iree-build-trace/ -S . \
-DCMAKE_BUILD_TYPE=Release \
-DIREE_ENABLE_ASSERTIONS=ON \
-DIREE_ENABLE_SPLIT_DWARF=ON \
-DIREE_ENABLE_THIN_ARCHIVES=ON \
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
(.venv) ➜ shark-ai git:(cbd6b7a6) ✗ python3 -m sharktank.examples.export_paged_llm_v1 --irpa-file=/sharedfile/attn/fp8_attn.irpa \
--output-mlir=/sharedfile/attn/128/fp8_attn.mlir \
--output-config=/sharedfile/attn/128/config_attn.json \
--bs-prefill=4 --bs-decode=4 --attention-kernel sharktank \
--attention-dtype=float8_e4m3fnuz --activation-dtype=bfloat16 --use-attention-mask --use-hf --kv-cache-dtype=float8_e4m3fnuz
cat_default
conv2d_default
conv2d_default
einsum_2args
elementwise_unary