Created
January 20, 2024 00:19
-
-
Save ZHAOZHIHAO/86abd534490dea2df2932f88825afd14 to your computer and use it in GitHub Desktop.
Run int8 quantization examples using docker, tensorrt or torch-tensorrt on NVIDA GPU cards
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Install torch-tensorrt by docker | |
The recommended way is to install the prebuilt docker https://github.com/pytorch/TensorRT | |
The version of torch-tensorrt images https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-12.html | |
docker pull nvcr.io/nvidia/pytorch:22.05-py3 | |
docker run --gpus device=0 -it --rm nvcr.io/nvidia/pytorch:22.05-py3 | |
Run the example | |
The example is at https://pytorch.org/TensorRT/_notebooks/vgg-qat.html / https://github.com/pytorch/TensorRT/blob/main/notebooks/vgg-qat.ipynb | |
The vgg16.py used in the example is at https://github.com/pytorch/TensorRT/blob/main/examples/int8/training/vgg16/vgg16.py | |
To copy the vgg16.py to docker, docker cp ./vgg16.py a072427cbc3e:/workspace where a072427cbc3e is the container id that is check by “docker ps” Output of the example: | |
Jit: Average batch time: 4.17 ms | |
Trt: Average batch time: 0.68 ms | |
Install tensorrt by docker | |
Quantize the resnet50 using pytorch-quantization https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#document-tutorials/quant_resnet50 and save the model as quantized *.onnx model | |
https://blog.csdn.net/sdhdsf132452/article/details/130136330 | |
# pull the image | |
docker pull nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 # the versions can be checked at https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/supported-tags.md | |
docker run --gpus device=0 -it --rm nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 | |
# get trtexec | |
docker cp ./TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz 07ec8d27504f:/home | |
Tar -xvzf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz | |
# convert onnx to tensorrt engine | |
docker cp ./quant_resnet50.onnx 07ec8d27504f:/home | |
./TensorRT-8.6.1.6/bin/trtexec --int8 --onnx=./quant_resnet50.onnx --saveEngine=quant_resnet50.engine | |
# inference engine using python | |
apt install python3 python3-pip | |
pip install opencv-python | |
docker cp ./cat.jpg 07ec8d27504f:/home | |
mkdir images & mv ./cat.jpg ./images/ | |
Error | |
[stdArchiveReader.cpp::nvinfer1::rt::StdArchiveReader::StdArchiveReader::30] Error Code 1: Serialization (Serialization assertion magicTagRead == magicTag failed.Magic tag does not match) | |
This error is caused by loading the engine using a different TensorRT version than the version used to build the engine. Check your environment to see if multiple TensorRT libs or cuDNN libs are involved. | |
Pip install tensorrt=8.6.1 solves the issue The engine is generated using TensorRT-8.6.1.6. | |
The speed of quantized resnet50 with batch_size=4 is 2.2ms | |
The prediction for a cat image is really a cat. | |
main.py https://www.cnblogs.com/chentiao/p/16671459.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment