GCP, Ubuntu instance (Canonical, Ubuntu, 16.04 LTS, amd64 xenial image built on 2020-06-10, supports Shielded VM features)
To get this example to actually work, I followed the official documentation, this blog post by AWS, and this YouTube demo
- Install Java 11
sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-11-jdk
1.1 Install Python3.7
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt unpdate
sudo apt install python3.7 python3.7-dev
1.2 Install PIP
sudo apt install python-pip python3-venv python3-pip
pip install --upgrade pip
this is no small feat, the right version of CUDA, nvidia driver, and PyTorch must be aligned.
- first check that you got the right driver for the CUDA version you need
- TorchServe requires PyTorch>=1.5, so check from their official site how to install PyTorch for your CUDA version
- Then to install the driver and CUDA, for reference only, here is a guide on 10.0 and driver version 410
pip install virtualenvwrapper
if you run into pip install issue this might help
if you have trouble locating virtualenvwrapper.sh
this might help
do a pip check
also to make sure there are no other missing packages
and lastly, make virtualenv available in Python3.7 by adding this alias to your ~/.bashrc
:
alias mkvirtualenv3='mkvirtualenv --python=`which python3.7` '
- create a
torchserve3
environment and install torchserve and torch-model-archiver
mkvirtualenv3 torchserve3
pip install torch torchtext torchvision sentencepiece psutil future
pip install torchserve torch-model-archiver
Now torchserve is availabe in your virtualenv torchserve3
Check that GPU is availabe by:
python -m torch.utils.collect_env
if you need to uninstall the wrong verison of cuda, see here and here
- Cuda10.0 only works with
torch==1.2
andtorchvision==0.4.0
but TorchServe requirestorch>=1.5
- For our specific machine, we need driver
nvidia-418
,cuda-10.1
, and this torch and torchvision install:
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
torch serve needs a model_store directory where archived models *.mar
will be served. Mine is at ~/models/model_store/
.
Start TorchServe (do this in a screen by:
torchserve --start --model-store ~/models/model_store/
You need to first enable SSL
assuming that you are using the keystore method, you need to create a config.properties
file with the following:
inference_address=https://0.0.0.0:8080
management_address=https://0.0.0.0:8081
keystore=keystore.p12
keystore_pass=changeit
keystore_type=PKCS12
then start TorchServe (in the same path as your keystore.p12
and config.properties
)with the following:
torchserve --start --model-store ~/models/model_store/ --ts-config config.properties
first clone the TorchServe repo to get access to the example model-file and extra-files:
git clone https://github.com/pytorch/serve.git
- Download a trained model into your model_store directory (mine is
~/models/model_store
)
wget https://download.pytorch.org/models/densenet161-8d451a50.pth -P ~/models/model_store
- Archive the model (run this in the parent directory of where your TorchServe repo directory sits)
torch-model-archiver --model-name densenet161 \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file ~/models/model_store/densenet161-8d451a50.pth \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier
- Move the archived model into model_store
mv densenet161.mar ~/models/model_store/
Optionally you can just host the model directly here (ideally do this in a screen)
torchserve --start --model-store ~/models/model_store/ --models densenet161=densenet161.mar
to register our DenseNet161 model in ~/models/model_store/
:
curl -X POST "http://localhost:8081/models?url=densenet161.mar"
to configure workers, number of gpu, timeout, etc... see here
We will add gpu to our densenet161 model here:
curl -v -X PUT "http://localhost:8081/models/densenet161?min_worker=8&number_gpu=2&synchronous=true"
using the batch_size
(max batch size the model expect to handle) and max_batch_delay
(milliseconds to wait to fill-up batch) flags in the management API we could enable batch inference like so:
# set batch size to 8 and max delay to 50ms for the model densenet161
curl -X POST "localhost:8081/models?url=densenet161.mar&batch_size=8&max_batch_delay=50"
curl "http://localhost:8081/models"
To see details of models running, for example our DenseNet161:
curl "http://localhost:8081/models/densenet161"
To simply see the health of torchserve:
curl http://localhost:8080/ping
- Download test image
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
- Send Image to Inference API
curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg
- Inference API endpoint:
http://localhost:8080/
- Management API endpoint:
http://localhost:8081/
- making detectron2 work on torchserve
- Torchserve Dashboard in Streamlit