- Intro
- NVIDIA RTX A4000
- NVIDIA RTX A2000
- RTX 4000
- NVIDIA T1000
- NVIDIA RTX 2080 Ti
- Quadro P4000
- Conslusion
Input data:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Duration: 01:23:05.33, start: 0.000000, bitrate: 3005 kb/s
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 2870 kb/s, 30 fps, 30 tbr, 90k tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
PC Specs:
- i9-10900k water cooled
- 64 GB Ram @ 3Ghz
- Ubuntu 21.10 kernel 5.13.0-28-generic
- nvidia-headless-470
- nvidia-docker2
Custom build FFMPEG(bethrezen/ffmpeg-rtmp-nvenc
docker image(sha256:0c17d7133066e0e305aba51d9abd28a4b07259c8dd719c1d072d073fd3baa6fb) based on Cuda 10.2):
ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --enable-version3 --enable-gpl --enable-nonfree --enable-small --enable-libmp3lame --enable-libx264 --enable-libtheora --enable-libvorbis --enable-libopus --enable-libfdk-aac --enable-libass --enable-librtmp --enable-postproc --enable-avresample --enable-libfreetype --enable-openssl --enable-avfilter --enable-libxvid --enable-pic
--enable-shared --enable-pthreads --enable-shared --enable-nvenc --enable-cuda --enable-opencl --enable-cuvid --enable-libnpp --disable-stripping --disable-static --disable-debug --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Fullhd -> Fullhd command:
time ffmpeg -y -v info -hwaccel cuvid -hwaccel_device 0 -c:v h264_cuvid -i ./video.mp4 \
-c:a aac -b:a 128k -c:v h264_nvenc -b:v 3500k -maxrate 3500k -preset fast -profile:v baseline -g 30 -r 30 out_1080.mp4
Transcoding command:
time ffmpeg -y -v info -hwaccel cuvid -hwaccel_device 0 -c:v h264_cuvid -i ./video.mp4 \
-c:a aac -b:a 128k -c:v h264_nvenc -b:v 3500k -maxrate 3500k -preset fast -profile:v baseline -g 30 -r 30 out_1080.mp4 \
-c:a aac -b:a 128k -c:v h264_nvenc -b:v 2000k -maxrate 2000k -vf hwupload,scale_npp=w=1280:h=720 -preset fast -profile:v baseline -g 30 -r 30 out_720.mp4 \
-c:a aac -b:a 128k -c:v h264_nvenc -b:v 1000k -maxrate 1000k -vf hwupload,scale_npp=w=854:h=480 -preset fast -profile:v baseline -g 30 -r 30 out_480.mp4
Specs from datasheet:
- 16 GB GDDR6 256-bit 448 GB/s
- Cuda Cores (Ampere): 6,144
- Tensor cores (3rd gen): 192
- RT Cores (2nd gen): 48
- Single-precision performance: 19.2 TFLOPS
- Total power consumption: 140W
- Encoding streams: unlimited
Additional power: PCIe 6-pin
PCIe slots: 1
Purchase price: $2053
~ 317 MB VRAM ± 65-74W
frame=149565 fps=825 q=22.0 Lsize= 1703147kB time=01:23:05.46 bitrate=2798.6kbits/s dup=5 drop=0 speed=27.5x
video:1627352kB audio:71575kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.248380%
[aac @ 0x562f54b56c40] Qavg: 7124.271
real 3m1.579s
user 1m8.791s
sys 0m3.517s
~ 379 MB VRAM ± 78-90W
frame=149565 fps=436 q=22.0 Lq=21.0 q=23.0 size= 1703147kB time=01:23:05.46 bitrate=2798.6kbits/s dup=15 drop=0 speed=14.5x
video:3157909kB audio:214724kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[aac @ 0x55c0d10231c0] Qavg: 7124.271
[aac @ 0x55c0d101ed80] Qavg: 7124.271
[aac @ 0x55c0d1034280] Qavg: 7124.271
real 5m43.207s
user 3m14.678s
sys 0m4.525s
Specs from datasheet:
- 6 GB GDDR6 192-bit 288 GB/s
- Cuda Cores (Ampere): 3,328
- Tensor cores (3rd gen): 104
- RT Cores (2nd gen): 26
- Single-precision performance: 8.0 TFLOPS
- Total power consumption: 70W
- Encoding streams: unlimited
Additional power: not required
PCIe slots: 2
Purchase price: $964
~ 256 MB VRAM ± 55W
frame=149565 fps=816 q=22.0 Lsize= 1703147kB time=01:23:05.46 bitrate=2798.6kbits/s dup=5 drop=0 speed=27.2x
video:1627352kB audio:71575kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.248380%
[aac @ 0x5619cec43cc0] Qavg: 7124.271
real 3m3.436s
user 1m7.428s
sys 0m2.941s
UPD8: Launching the same test on i9-12900K CPU but on PCIe 3.0 4x lane resulted in 8% fps penalty. Seems like PCIe bandwidth matters a lot here.
~ 317 MB VRAM ± 65W
frame=149565 fps=460 q=22.0 Lq=21.0 q=23.0 size= 1703147kB time=01:23:05.46 bitrate=2798.6kbits/s dup=15 drop=0 speed=15.3x
video:3157909kB audio:214724kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[aac @ 0x5596ac0e11c0] Qavg: 7124.271
[aac @ 0x5596ac0dcd80] Qavg: 7124.271
[aac @ 0x5596ac0f2280] Qavg: 7124.271
real 5m25.119s
user 2m51.941s
sys 0m4.682s
Specs from datasheet:
- 8 GB GDDR6 192-bit 416 GB/s
- Cuda Cores (Turing): 2,304
- Tensor cores (? gen): 288
- RT Cores (? gen): 36
- Single-precision performance: 7.1 TFLOPS
- Total power consumption: 125W
- Encoding streams: unlimited
Additional power: PCIe 6+2-pin
PCIe slots: 1
Purchase price: $1287
~ 256 MB VRAM ± 61W
frame=149565 fps=876 q=22.0 Lsize= 1737670kB time=01:23:05.46 bitrate=2855.3kbits/s dup=5 drop=0 speed=29.2x
video:1661875kB audio:71575kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.243433%
[aac @ 0x55d81d057cc0] Qavg: 7124.271
real 2m51.125s
user 1m7.820s
sys 0m3.456s
~ 315 MB VRAM ± 70W
frame=149565 fps=492 q=22.0 Lq=21.0 q=23.0 size= 1737670kB time=01:23:05.46 bitrate=2855.3kbits/s dup=15 drop=0 speed=16.4x
video:3214019kB audio:214724kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[aac @ 0x562a5beda1c0] Qavg: 7124.271
[aac @ 0x562a5bed5d80] Qavg: 7124.271
[aac @ 0x562a5beeb280] Qavg: 7124.271
real 5m4.469s
user 2m48.625s
sys 0m4.851s
Specs from datasheet:
- 4 GB GDDR6 128-bit 160 GB/s
- Cuda Cores (Turing): 896
- Tensor cores: none
- RT Cores: none
- Single-precision performance: <= 2.5 TFLOPS
- Total power consumption: 50W
- Encoding streams: 3
Additional power: not required
PCIe slots: 1
Purchase price: $435
~ 209 MB VRAM <= 50W (nvidia-smi reports N/A)
frame=149565 fps=732 q=21.0 Lsize= 1732661kB time=01:23:05.46 bitrate=2847.1kbits/s dup=5 drop=0 speed=24.4x
video:1656867kB audio:71575kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.244139%
[aac @ 0x55f4f4a0ecc0] Qavg: 7124.271
real 3m24.583s
user 1m12.531s
sys 0m3.221s
~ 273 MB VRAM <= 50W (nvidia-smi reports N/A)
frame=149565 fps=405 q=21.0 Lq=20.0 q=22.0 size= 1732661kB time=01:23:05.46 bitrate=2847.1kbits/s dup=15 drop=0 speed=13.5x
video:3269555kB audio:214724kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[aac @ 0x564098f571c0] Qavg: 7124.271
[aac @ 0x564098f52d80] Qavg: 7124.271
[aac @ 0x564098f68280] Qavg: 7124.271
real 6m9.938s
user 3m38.598s
sys 0m5.236s
Disclaimer: this test was made on the same video but on a bit different server with i9-12900K CPU and 3600 Mhz RAM. Driver version: 510.47.03
Specs from datasheet:
- 11 GB GDDR6 352-bit 616 GB/s
- Cuda Cores (Turing): 4352
- Tensor cores: 544 ?
- RT Cores: 68 ?
- Single-precision performance: 13.45 TFLOPS?
- Total power consumption: 260W
- Encoding streams: 3
Additional power: PCIe 2x 6+2-pin
PCIe slots: 3
Purchase price: N/A
~ 370 MB VRAM ±93W
frame=149565 fps=911 q=22.0 Lsize= 1737670kB time=01:23:05.46 bitrate=2855.3kbits/s dup=5 drop=0 speed=30.4x
video:1661875kB audio:71575kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.243433%
[aac @ 0x55a480458c40] Qavg: 7124.271
real 2m44.401s
user 1m21.541s
sys 0m2.731s
~ 370 MB VRAM ±104W
frame=149565 fps=521 q=22.0 Lq=21.0 q=23.0 size= 1737670kB time=01:23:05.46 bitrate=2855.3kbits/s dup=15 drop=0 speed=17.4x
video:3214019kB audio:214724kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[aac @ 0x55ea220f81c0] Qavg: 7124.271
[aac @ 0x55ea220f3d80] Qavg: 7124.271
[aac @ 0x55ea22109280] Qavg: 7124.271
real 4m47.268s
user 3m17.701s
sys 0m4.718s
Disclaimer: this test was made on the same video but on a bit different server with AMD Ryzen 9 3950X and some load on it(but no nvenc load). Driver version: 450.119.03
Specs from datasheet:
- 8 GB GDDR5 256-bit 243 GB/s
- Cuda Cores (Pascal): 1792
- Tensor cores: none
- RT Cores: none
- Single-precision performance: 5.2 TFLOPS
- Total power consumption: 105W
- Encoding streams: unlimited
Additional power: PCIe 6-pin
PCIe slots: 1
Purchase price: N/A
~ 234 MB VRAM ±43W
frame=149565 fps=644 q=21.0 Lsize= 1732661kB time=01:23:05.46 bitrate=2847.1kbits/s dup=5 drop=0 speed=21.5x
video:1656867kB audio:71575kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.244139%
[aac @ 0x56269dd00c40] Qavg: 7124.271
real 3m52.991s
user 1m38.497s
sys 0m3.769s
~ 294 MB VRAM ±50W
frame=149565 fps=384 q=21.0 Lq=20.0 q=22.0 size= 1732661kB time=01:23:05.46 bitrate=2847.1kbits/s dup=15 drop=0 speed=12.8x
video:3269555kB audio:214724kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[aac @ 0x55cfae2e21c0] Qavg: 7124.271
[aac @ 0x55cfae2ddd80] Qavg: 7124.271
[aac @ 0x55cfae2f3280] Qavg: 7124.271
real 6m30.079s
user 3m59.463s
sys 0m5.913s
For video encoding NVIDIA T1000
(while having only 3 encoding streams) is the perfect cost/fps.
If you need some real CUDA and/or Tensor cores - NVIDIA RTX A2000
is your choice. They both doesn't require additional PCIe power connectors.
For home-usage in some little machine learning things you choose between cheaper 8GB RTX 4000
and more expensive 16GB NVIDIA RTX A4000
.