Last active
October 11, 2022 08:45
-
-
Save thepycoder/21561d51b5880a7d0cdf04041433acaf to your computer and use it in GitHub Desktop.
Triton Ensemble
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Convert a Huggingface model to ONNX | |
docker run -it --rm --gpus all \ | |
-v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.5.1 \ | |
bash -c "cd /project && \ | |
convert_model -m \"philschmid/MiniLM-L6-H384-uncased-sst2\" \ | |
--backend tensorrt onnx \ | |
--seq-len 16 128 128" | |
# This will have outputted a triton_models/ folder, | |
# which we can now serve using Triton | |
docker run -it --rm --gpus all -p8000:8000 -p8001:8001 -p8002:8002 --shm-size 256m \ | |
-v $PWD/triton_models:/models nvcr.io/nvidia/tritonserver:22.07-py3 \ | |
bash -c "pip install transformers && tritonserver --model-repository=/models" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment