how to install torchserve and get your first model running

Installing TorchServe

Machine Type

GCP, Ubuntu instance (Canonical, Ubuntu, 16.04 LTS, amd64 xenial image built on 2020-06-10, supports Shielded VM features)

To get this example to actually work, I followed the official documentation, this blog post by AWS, and this YouTube demo

Install Java 11

sudo add-apt-repository ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install openjdk-11-jdk

1.1 Install Python3.7

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt unpdate
sudo apt install python3.7 python3.7-dev

1.2 Install PIP

sudo apt install python-pip python3-venv python3-pip
pip install --upgrade pip

1.3 Install the right version of CUDA

this is no small feat, the right version of CUDA, nvidia driver, and PyTorch must be aligned.

first check that you got the right driver for the CUDA version you need
TorchServe requires PyTorch>=1.5, so check from their official site how to install PyTorch for your CUDA version
Then to install the driver and CUDA, for reference only, here is a guide on 10.0 and driver version 410

Instal VirtualEnvWrapper

pip install virtualenvwrapper

if you run into pip install issue this might help
if you have trouble locating virtualenvwrapper.sh this might help
do a pip check also to make sure there are no other missing packages
and lastly, make virtualenv available in Python3.7 by adding this alias to your ~/.bashrc:

alias mkvirtualenv3='mkvirtualenv --python=`which python3.7` '

create a torchserve3 environment and install torchserve and torch-model-archiver

mkvirtualenv3 torchserve3
pip install torch torchtext torchvision sentencepiece psutil future
pip install torchserve torch-model-archiver

Now torchserve is availabe in your virtualenv torchserve3

Check that GPU is availabe by:

python -m torch.utils.collect_env

if you need to uninstall the wrong verison of cuda, see here and here

Extra Notes on PyTorch & CUDA

Cuda10.0 only works with torch==1.2 and torchvision==0.4.0 but TorchServe requires torch>=1.5
For our specific machine, we need driver nvidia-418, cuda-10.1, and this torch and torchvision install:

pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Start TorchServe

torch serve needs a model_store directory where archived models *.mar will be served. Mine is at ~/models/model_store/.

Start TorchServe (do this in a screen by:

torchserve --start --model-store ~/models/model_store/

Configuring a Public API

You need to first enable SSL
assuming that you are using the keystore method, you need to create a config.properties file with the following:

inference_address=https://0.0.0.0:8080                                                                                    
management_address=https://0.0.0.0:8081                                                                                   
keystore=keystore.p12
keystore_pass=changeit
keystore_type=PKCS12

then start TorchServe (in the same path as your keystore.p12 and config.properties)with the following:

torchserve --start --model-store ~/models/model_store/ --ts-config config.properties

Archiving a Model

first clone the TorchServe repo to get access to the example model-file and extra-files:

git clone https://github.com/pytorch/serve.git

Download a trained model into your model_store directory (mine is ~/models/model_store)

wget https://download.pytorch.org/models/densenet161-8d451a50.pth -P ~/models/model_store

Archive the model (run this in the parent directory of where your TorchServe repo directory sits)

torch-model-archiver --model-name densenet161 \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file ~/models/model_store/densenet161-8d451a50.pth \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier

Move the archived model into model_store

mv densenet161.mar ~/models/model_store/

Optionally you can just host the model directly here (ideally do this in a screen)

torchserve --start --model-store ~/models/model_store/ --models densenet161=densenet161.mar

Register your Model

to register our DenseNet161 model in ~/models/model_store/:

curl -X POST "http://localhost:8081/models?url=densenet161.mar"

to configure workers, number of gpu, timeout, etc... see here
We will add gpu to our densenet161 model here:

curl -v -X PUT "http://localhost:8081/models/densenet161?min_worker=8&number_gpu=2&synchronous=true"

Making batch inference

using the batch_size (max batch size the model expect to handle) and max_batch_delay(milliseconds to wait to fill-up batch) flags in the management API we could enable batch inference like so:

# set batch size to 8 and max delay to 50ms for the model densenet161
curl -X POST "localhost:8081/models?url=densenet161.mar&batch_size=8&max_batch_delay=50"

See Running Models

curl "http://localhost:8081/models"

To see details of models running, for example our DenseNet161:

curl "http://localhost:8081/models/densenet161"

To simply see the health of torchserve:

curl http://localhost:8080/ping

Make Inference Request

Download test image

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

Send Image to Inference API

curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg

Reference

Inference API endpoint: http://localhost:8080/
Management API endpoint: http://localhost:8081/
making detectron2 work on torchserve
Torchserve Dashboard in Streamlit

ohjho/torchserve_install.md