Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ZHAOZHIHAO/86abd534490dea2df2932f88825afd14 to your computer and use it in GitHub Desktop.
Save ZHAOZHIHAO/86abd534490dea2df2932f88825afd14 to your computer and use it in GitHub Desktop.
Run int8 quantization examples using docker, tensorrt or torch-tensorrt on NVIDA GPU cards
Install torch-tensorrt by docker
The recommended way is to install the prebuilt docker https://github.com/pytorch/TensorRT
The version of torch-tensorrt images https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-12.html
docker pull nvcr.io/nvidia/pytorch:22.05-py3
docker run --gpus device=0 -it --rm nvcr.io/nvidia/pytorch:22.05-py3
Run the example
The example is at https://pytorch.org/TensorRT/_notebooks/vgg-qat.html / https://github.com/pytorch/TensorRT/blob/main/notebooks/vgg-qat.ipynb
The vgg16.py used in the example is at https://github.com/pytorch/TensorRT/blob/main/examples/int8/training/vgg16/vgg16.py
To copy the vgg16.py to docker, docker cp ./vgg16.py a072427cbc3e:/workspace where a072427cbc3e is the container id that is check by “docker ps”
Output of the example:
Jit: Average batch time: 4.17 ms
Trt: Average batch time: 0.68 ms
Install tensorrt by docker
Quantize the resnet50 using pytorch-quantization https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#document-tutorials/quant_resnet50 and save the model as quantized *.onnx model
https://blog.csdn.net/sdhdsf132452/article/details/130136330
# pull the image
docker pull nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 # the versions can be checked at https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/supported-tags.md
docker run --gpus device=0 -it --rm nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
# get trtexec
docker cp ./TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz 07ec8d27504f:/home
Tar -xvzf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
# convert onnx to tensorrt engine
docker cp ./quant_resnet50.onnx 07ec8d27504f:/home
./TensorRT-8.6.1.6/bin/trtexec --int8 --onnx=./quant_resnet50.onnx --saveEngine=quant_resnet50.engine
# inference engine using python
apt install python3 python3-pip
pip install opencv-python
docker cp ./cat.jpg 07ec8d27504f:/home
mkdir images & mv ./cat.jpg ./images/
Error
[stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::30] Error Code 1: Serialization (Serialization assertion magicTagRead == magicTag failed.Magic tag does not match)
This error is caused by loading the engine using a different TensorRT version than the version used to build the engine. Check your environment to see if multiple TensorRT libs or cuDNN libs are involved.
Pip install tensorrt=8.6.1 solves the issue The engine is generated using TensorRT-8.6.1.6.
The speed of quantized resnet50 with batch_size=4 is 2.2ms
The prediction for a cat image is really a cat.
main.py https://www.cnblogs.com/chentiao/p/16671459.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment