- CPU: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- GPU: NVIDIA V100
- Memory: 251GiB
- OS: Ubuntu 16.04.6 LTS (Xenial Xerus)
Docker Images:
- tensorflow/tensorflow:latest-gpu
- tensorflow/serving:latest-gpu
- nvcr.io/nvidia/tensorrtserver:19.10-py3
Framework | Model | Model Type | Images | Batch size | Time(s) |
---|---|---|---|---|---|
Tensorflow | ResNet50 | TF Savedmodel | 32000 | 32 | 83.189 |
Tensorflow | ResNet50 | TF Savedmodel | 32000 | 10 | 86.897 |
Tensorflow Serving | ResNet50 | TF Savedmodel | 32000 | 32 | 120.496 |
Tensorflow Serving | ResNet50 | TF Savedmodel | 32000 | 10 | 116.887 |
Triton (TensorRT Inference Server) | ResNet50 | TF Savedmodel | 32000 | 32 | 201.855 |
Triton (TensorRT Inference Server) | ResNet50 | TF Savedmodel | 32000 | 10 | 171.056 |
Falcon + msgpack + Tensorflow | ResNet50 | TF Savedmodel | 32000 | 32 | 115.686 |
Falcon + msgpack + Tensorflow | ResNet50 | TF Savedmodel | 32000 | 10 | 115.572 |
By the way, this disabled the dynamic batch. For the async part, I personally think this is CPU and GPU bound tasks, so it won't benefit from async. But I haven't try the async frameworks like starlette. If you have such benchmark results, I'd like to know the details.