Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Brainiarc7/7b6049aac3145927ae1cfeafc8f682c1 to your computer and use it in GitHub Desktop.
Save Brainiarc7/7b6049aac3145927ae1cfeafc8f682c1 to your computer and use it in GitHub Desktop.
ffmpeg livestreaming to youtube via Nvidia's NVENC and Intel's VAAPI on supported hardware

Streaming your Linux desktop to Youtube and Twitch via Nvidia's NVENC and VAAPI:

Considerations to take when live streaming:

The following best practice observations apply when using a hardware-based encoder for live streaming to any platform:

  1. Set the buffer size (-bufsize:v) equal to the target bitrate (-b:v). You want to ensure that you're encoding in CBR mode.

  2. Set up the encoders as shown:

(a). Omit the presets as they'll override the desired rate control mode.

(b). Use a rate control mode that enforces a constant rate control mode. That way, we can provide a stream with a perfect fixed bit-rate (per variant, where applicable, with multiple outputs as shown in some examples below).

(c ). For multiple outputs (where enabled), call up the versatile tee muxer, outputting to multiple streaming services (or to a file, if so desired). Substitute all variables (such as stream keys) with your own.

(d). The thread count is also lowered to ensure that VBV underflows do not occur. Note that higher thread counts for hardware-accelerated encoders and decoders have rapidly diminishing returns.

(e. On consumer SKUs, NVENC is limited to two encode sessions only. This is by design. You can override this artificial limit by using this project. This does not apply to the likes of AMD's VCE (via VAAPI) or Intel's VAAPI and QSV implementations.

  1. With Nvidia's NVENC:

Without scaling the output:

If you want the output video frame size to be the same as the input for Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 2400k -minrate:v 2400k -maxrate:v 2400k -bufsize:v 2400k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2  -b:a 128k \
-b:v 2400k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note: Ensure that the size specified above (-s ) does match your screen's resolution. Also, ensure that the stream key used is correct, otherwise your live broadcast will fail.

You can optionally capture from the first KMS device (as long as you're a member of the video group) as shown:

For Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-device /dev/dri/card0 -f kmsgrab -i - \
-thread_queue_size 1024 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Scaling the output:

If you want the output video frame size to be smaller than the input then you can insert the appropriate scale video filter for NVENC:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note that the examples above will use the CUDA-based NPP acceleration in the GPU to re-scale the output to a perfect 720p (HD-ready) video stream on broadcast. This will require you to build ffmpeg with support for both NVENC and the CUDA SDK.

With webcam overlay:

This will place your webcam overlay in the top right:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-filter:v "hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos" \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r 60 -g 120 -bf:v 3 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

You can see additional details your webcam with something like:

ffmpeg -f v4l2 -list_formats all -i /dev/video0 

Or with:

v4l2-ctl --list-formats-ext

See the documentation on the ​video4linux2 (v4l2) input device for more info.

With webcam overlay and logo:

This will place your webcam overlay in the top right, and a logo in the bottom left:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-i logo.png -filter_complex \
"[0:v]hwupload_cuda,scale_npp=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]hwupload_cuda,scale_npp=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-i logo.png -filter_complex \
"[0:v]hwupload_cuda,scale_npp=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]hwupload_cuda,scale_npp=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10[out]"
-map "[out]" -map 2:a \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Streaming a file to Youtube:

ffmpeg -re -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-i input.mkv \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Outputting to multiple streaming services & local file:

You can use the tee muxer to efficiently stream to multiple sites and save a local copy if desired. Using tee will allow you to encode only once and send the same data to multiple outputs. Using the onfail option will allow the other streams to continue if one fails.

ffmpeg -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-i input -map 0 -bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-c:v h264_nvenc -qp:v 19  \
-profile:v high -rc:v cbr_ld_hq -level:v 4.2 -r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f tee \
"[f=flv:onfail=ignore]rtmp://live.twitch.tv/app/<stream key>|  [f=flv:onfail=ignore]rtmp://a.rtmp.youtube.com/live2/<stream key>|local_file.mkv"

The example above will stream to both Youtube and twitch TV and at the same time, store a copy of the video stream on the local filesystem. Modify paths as needed.

Notes:

You can use xwininfo | grep geometry to select the target window and get placement coordinates. For example, an output of -geometry 800x600+284+175 would result in using -video_size 800x600 -i :0.0+284,175. You can also use it to automatically enter the input screen size: -video_size $(xwininfo -root | awk '/-geo/{print $2}').

The ​pulse input device (requires --enable-libpulse) can be an alternative to the ​ALSA input device, as in:

-f pulse -i default

As per your preference.

Part 2: Using Intel's VAAPI to achieve the same:

Using x11grab as shown below:

Without scaling the output:

If you want the output video frame size to be the same as the input for Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

With scaling the output:

If you want the output video frame size to be scaled for Twitch:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

To do the same for Youtube, do:

ffmpeg -loglevel debug -threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note: Ensure that you've set the admin SETCAP capability for your FFmpeg binary if you intend to capture from KMS devices:

sudo setcap cap_sys_admin+ep /path/to/ffmpeg

As shown in the examples below. Note that KMS capture is highly prone to failure on multi-GPU systems when using direct device derivation for VAAPI encoder contexts, as shown below. Where possible, stick to x11grab instead.

Tips with using KMS:

KMS surfaces can be mapped directly to VAAPI, as shown in the example below, with scaling enabled:

ffmpeg -threads:v 2 -threads:a 8 -filter_threads 2 \
-framerate 60 -f kmsgrab -i - \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1920:h=1080:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16  \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Also, ensure that the stream key used is correct, otherwise your live broadcast will fail.

Scaling the output:

If you want the output video frame size to be smaller than the input then you can insert the appropriate scale video filter for VAAPI:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f kmsgrab -i - \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f kmsgrab -framerate 60 -i - \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-vf 'hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Note that the examples above will use Intel's VAAPI-based scaling engines (available since Sandybridge) in the GPU to re-scale the output to a perfect 720p (HD-ready) video stream on broadcast.

With webcam overlay:

This will place your webcam overlay in the top right:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-thread_queue_size 1024 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 60 -i /dev/video0 \
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf  'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

You can see additional details your webcam with something like:

 ffmpeg -f v4l2 -list_formats all -i /dev/video0 

Or with:

v4l2-ctl --list-formats-ext

See the documentation on the ​video4linux2 (v4l2) input device for more info.

Note: Your webcam may natively support whatever frame size you want to overlay onto the main video, so scaling the webcam video as shown in this example can be omitted (just set the appropriate v4l2 -video_size and remove the scale=120:-1,). See this answer on superuser for more details on the same.

With webcam overlay and logo:

This will place your webcam overlay in the top right, and a logo in the bottom left:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-i logo.png -filter_complex \
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a 
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://live.twitch.tv/app/<stream key>

And for Youtube:

ffmpeg -loglevel debug \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0 \
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0 \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-i logo.png -filter_complex \
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg]; \
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg]; \
[bg][fg]overlay=W-w-10:10[bg2]; \
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a 
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Streaming a file to Youtube:

ffmpeg -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-re -i input.mkv -init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va  \ 
-vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \
-c:a aac -b:a 160k -ac 2 \
-f flv rtmp://a.rtmp.youtube.com/live2/<stream key>

Outputting to multiple streaming services & local file:

You can use the tee muxer to efficiently stream to multiple sites and save a local copy if desired. Using tee will allow you to encode only once and send the same data to multiple outputs. Using the onfail option will allow the other streams to continue if one fails.

ffmpeg -threads:v 2 -threads:a 8 -filter_threads 2 \
-re -i input.mkv -init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-vf 'format=nv12,hwupload,scale_vaapi=w=1920:h=1080' \
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k \
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16 \ 
-c:a aac -b:a 160k -ac 2 -map 0 \
-f tee \
"[f=flv:onfail=ignore]rtmp://live.twitch.tv/app/<stream key>|[f=flv:onfail=ignore]rtmp://a.rtmp.youtube.com/live2/<stream key>|local_file.mkv"

The example above will stream to both YouTube and twitch TV and at the same time, store a copy of the video stream on the local file-system. Modify paths as needed.

Note: For VAAPI, use the scaler on local file encodes if, and only if, you need to use it, and for best results, stick to the original video stream dimensions, or downscale if needed (scale_vaapi=w=x:h=y) as up-scaling can introduce artifacting.

To determine the video stream's properties, you can run:

ffprobe -i path-to-video.ext

Where path-to-video-ext refers to the absolute path and name of the file.

Extras:

Outputting to multiple outputs can be achieved as shown, using a sample for both NVENC and VAAPI: (As requested by @semeion below, substitute the scale values with the resolutions you want).

(a). NVENC:

ffmpeg -loglevel $LOGLEVEL \
-threads:v 2 -threads:a 8 -filter_threads 2 \
-f x11grab -thread_queue_size 512 -s "$INRES" -framerate "$FPS" -i :0.0 \
-f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
-filter_complex "split=2[a][b]; \
[a]hwupload_cuda,scale_npp=w=1280:h=720:interp_algo=lanczos[c]; \
[b]hwupload_cuda,scale_npp=w=1024:h=768:interp_algo=lanczos[d]" \
-b:v:0 2400k -minrate:v:0 2400k -maxrate:v:0 2400k -bufsize:v:0 2400k -c:v:0 h264_nvenc -qp:v:0 19  \
-profile:v:0 high -rc:v:0 cbr_ld_hq -level:v:0 4.1 -r:v 60 -g:v 120 -bf:v:0 3 -refs:v 16 \
-b:v:1 1500k -minrate:v:1 1500k -maxrate:v:1 1500k -bufsize:v:1 2400k -c:v:1 h264_nvenc -qp:v:1 19  \
-profile:v:1 high -rc:v:1 cbr_ld_hq -level:v:1 4.1 -r:v 60 -g:v 120 -bf:v:1 3 -refs:v 16 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-map "[c]" -map "[d]" -map "1:a" \
-f tee  \
"[select=\'v:0,a\':f=flv:onfail=ignore]"rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"| \
[select=\'v:1,a\':f=flv:onfail=ignore]"rtmp://a.rtmp.youtube.com/live2/<stream key>""

(b). VAAPI:

ffmpeg -loglevel $LOGLEVEL \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va \
-threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
-f x11grab -thread_queue_size 512 -s "$INRES" -framerate "$FPS" -i :0.0 \
-f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
-filter_complex "split=2[a][b]; \
[a]format=nv12,hwupload,scale_vaapi=w=1280:h=720[c]; \
[b]format=nv12,hwupload,scale_vaapi=w=1024:h=768[d]" \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
-b:v:0 2400k -minrate:v:0 2400k -maxrate:v:0 2400k -bufsize:v:0 2400k -c:v:0 h264_vaapi -qp:v:0 19  \
-profile:v:0 high -level:v:0 4.1 -r:v 60 -g:v 120 -bf:v:0 3 -refs:v 16 \
-b:v:1 1500k -minrate:v:1 1500k -maxrate:v:1 1500k -bufsize:v:1 2400k -c:v:1 h264_vaapi -qp:v:1 19  \
-profile:v:1 high -level:v:1 4.1 -r:v 60 -g:v 120 -bf:v:1 3 -refs:v 16 \
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k \
-map "[c]" -map "[d]" -map "1:0" \
-f tee  \
"[select=\'v:0,a\':f=flv:onfail=ignore]"rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"| \
[select=\'v:1,a\':f=flv:onfail=ignore]"rtmp://a.rtmp.youtube.com/live2/<stream key>""

Substitute the used variables with your own values.

Todo: Document the use of QuickSync (QSV) encoders in live-streaming.

@semeion
Copy link

semeion commented Jan 15, 2019

I have no idea the reason for what the comand don´t work. I am using two videocards:

lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)

My primary/active videocard in X11 is the Intel, the NVIDIA is for CUDA processing only, can it be the problem?

/etc/X11/xorg.conf.d/20-intel.conf

Section "Device"
  Identifier  "Intel Graphics"
  Driver      "intel"
  BusID       "PCI:00:2:0
EndSection

@semeion
Copy link

semeion commented Jan 16, 2019

i think the problem is something with input (depths (7): 24, 1, 4, 8, 15, 16, 32), but i don´t know how to fix it

screen #0:
  dimensions:    1920x1080 pixels (508x285 millimeters)
  resolution:    96x96 dots per inch
  depths (7):    24, 1, 4, 8, 15, 16, 32
  root window id:    0xf5
  depth of root window:    24 planes
  number of colormaps:    minimum 1, maximum 1
  default colormap:    0x22
  default number of colormap cells:    256
  preallocated pixels:    black 0, white 16777215
  options:    backing-store WHEN MAPPED, save-unders NO
  largest cursor:    256x256

@dEROZA
Copy link

dEROZA commented Jan 17, 2019

Is a way to use rtmp for youtube for 4K UHD streaming on h264_nvenc on my 1080Ti? Itried to wirite as file, but not works.

-preset hq -rc cbr -profile high -coder 1 -g 120 -bf 4 -movflags faststart

Youtube says that bitrate very low.. but I setup its to 51k

I tested, 51k good bitrate when I write stream to file, but rtmp..

@Brainiarc7
Copy link
Author

@dEROZA,

Show me the full ffmpeg command you're running.

@semeion,

Correct your command as shown:

ffmpeg -loglevel $LOGLEVEL -init_hw_device cuda=0 -filter_hw_device 0 \
    -f x11grab -thread_queue_size 256 -s "$INRES" -framerate "$FPS" -i :0.0 \
    -f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
    -filter:v format=nv12,hwupload,scale_npp=w=1280:h=720:interp_algo=lanczos \
    -c:v h264_nvenc -preset:v llhq \
    -rc:v vbr_hq  \
    -f flv "rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"

I'd assume you've build FFmpeg with CUDA enabled, correct?

@semeion
Copy link

semeion commented Jan 24, 2019

@Brainiarc7,

Thank you very much, your command is working like a charm.

Is possible transmit in two different resolutions? like1280x720 and 1024x768 for example?

BTW, Thank you again, that command is amazing, saved my week :D

@Brainiarc7
Copy link
Author

@semeion,

Yes. I'll provide an example soon that takes advantage of the scale_npp filter paired with a -filter_complex chain for the same.

@Brainiarc7
Copy link
Author

@semeion , here comes:

ffmpeg -loglevel $LOGLEVEL -init_hw_device cuda=0 -filter_hw_device 0 \
    -threads:v 2 -threads:a 8 -complex_filter_threads 2 -filter_threads 2 \
    -f x11grab -thread_queue_size 256 -s "$INRES" -framerate "$FPS" -i :0.0 \
    -f alsa -thread_queue_size 512 -ac 2 -i hw:0,0 \
    -filter_complex "split=2[a][b]; \
    [a]format=nv12,hwupload,scale_npp=w=1280:h=720:interp_algo=lanczos,hwdownload,format=nv12[c]; \
    [b]format=nv12,hwupload,scale_npp=w=1024:h=768:interp_algo=lanczos,hwdownload,format=nv12[d] \"
   -bsf:a aac_adtstoasc -c:a aac -ac 2 -ar 48000 -b:a 128k \
  -b:v:0 2400k -minrate:v:0 2400k -maxrate:v:0 2400k -bufsize:v:0 160k -c:v:0 h264_nvenc -pix_fmt nv12 -qp:v:0 19  \
  -profile:v:0 high -rc:v:0 cbr_ld_hq -level:v:0 4.1 -r 60 -g 120 -bf:v:0 3 \
  -b:v:1 1500k -minrate:v:1 1500k -maxrate:v:1 1500k -bufsize:v:1 100k -c:v:1 h264_nvenc -pix_fmt nv12 -qp:v:1 19  \
  -profile:v:1 high -rc:v:1 cbr_ld_hq -level:v:1 4.1 -r 60 -g 120 -bf:v:1 3 \
  -map "[c]" -map "[d]" -map "0:a" \
  -f tee  \
  "[select=\'v:0,a\':f=flv:onfail=ignore]"rtmp://$SERVER.twitch.tv/app/$STREAM_KEY"| \
   [select=\'v:1,a\':f=flv:onfail=ignore]"rtmp://a.rtmp.youtube.com/live2/<stream key>""

Considerations:

  1. Compute the optimal buffersize (-bufsize:v) for hardware-based encoders as shown:

bufsize=4(target bitrate/frame rate)

  1. The NVENC encoder is set up as shown:

(a). Omit the presets as they'll override the desired rate control mode.

(b). Set the rate control mode to cbr_ld_hq. That way, we can provide a stream with a perfect fixed bitrate per variant.

(c). For outputs, call up the versatile tee muxer, outputting to both Youtube and Twitch. Substitute all variables (such as stream keys) with your own.

(d). The thread count is also lowered to ensure the accuracy of the VBV buffer. A higher thread count for hardware-accelerated encoders has rapidly diminishing returns.

(e). The split filter is used to create two outputs from one input, so that we can generate two encoder variants as needed.

(f). On consumer SKUs, NVENC is limited to two encode sessions only. This is by design. You can override this by using this project.

@Brainiarc7
Copy link
Author

The gist has been updated. Retest if you're running into errors.

@semeion
Copy link

semeion commented Feb 13, 2019

It´s amazing! Thank you very much!

@Brainiarc7
Copy link
Author

You're welcome @semeion

@Brainiarc7
Copy link
Author

See the updates to the gist.
Unneccessary hwupload filter instances have been omitted. Also, new notes on webcam pixel format support have been added.

@nrdxp
Copy link

nrdxp commented Nov 8, 2019

anyway to have a webcam overlay with kmsgrab?

@kokoko3k
Copy link

I tried to use kmsgrab with nvidia, but it failed that way:

    koko@slimer# sudo cat /sys/module/nvidia_drm/parameters/modeset
    Y

    koko@slimer# sudo setcap cap_sys_admin+ep /usr/bin/ffmpeg 

    koko@slimer# ffmpeg  -device /dev/dri/card0 -f kmsgrab -i  - -f image2pipe -
    ffmpeg version n4.2.1 Copyright (c) 2000-2019 the FFmpeg developers
      built with gcc 9.2.0 (GCC)
      configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libjack --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-nvdec --enable-nvenc --enable-omx --enable-shared --enable-version3
      libavutil      56. 31.100 / 56. 31.100
      libavcodec     58. 54.100 / 58. 54.100
      libavformat    58. 29.100 / 58. 29.100
      libavdevice    58.  8.100 / 58.  8.100
      libavfilter     7. 57.100 /  7. 57.100
      libswscale      5.  5.100 /  5.  5.100
      libswresample   3.  5.100 /  3.  5.100
      libpostproc    55.  5.100 / 55.  5.100
    [kmsgrab @ 0x56355affb7c0] Failed to open DRM device.
    pipe:: Invalid argument

I've posted the request to the official forum, but developer are silent.
Is it supposed to fail for a limitation of the nvidia driver, maybe, or am i doing something wrong?

Thanks!

@koreanfan
Copy link

Hello. I try your guide with modifications for amd card with vaapi. If i use vaapi with libx264 then i can stream on youtube and i have green color information about stream. If i use libx264 than i also can stream and have green color. But when i stream only with vaapi i have orange color information then shows red color and i have notification smth lika that "video buffering or low speed" and video of this broken stream can shows up in videos after 8-10 or even not shown. When i have red color with only vaapi i immediatly stop this stream and start stream with libx264 or libx264 with vaapi and its work good. So tis not porblem of my connection to the internet. Smth wrong with vaapi driver i guess or smth else. I try many setups for streaming with only vaapi and i dont have good result, all streams are broken. But when i use vaa[i to create screencapture into a file all works fine. If i use obs linux for stream with vaapi it works only with low bitrate<4000 with 30fps for 1920X1080 or 60 fps for 720p. When i use obs in windows i can set higher framerate, resol, bitrate cuz in windows used amd amf encoder. Why all can stream with ffmpeg vaapi from linux terminal but i cant stream. Videocard amd rx560, cpu fx8300, 8gb ram. So its not very rare pc and vaapi must work. Please help me.

@Brainiarc7
Copy link
Author

Brainiarc7 commented Feb 3, 2020 via email

@koreanfan
Copy link

AMD pro drivers on linux demonstrated very low fps in games with opengl. To use vulcan with amd drivers you need hi speed HDD lika ssd, cuz vulcan use hdd buffer for pre-caching resources. So i cant use vulcan cuz i have low fps and feezes with that api. I install debian bullseye and install latest obs and i can stream now with ffmpeg vaapi.

@koreanfan
Copy link

I try ffmpeg commandline again. How to stream a specific application and not the entire desktop with ffmpeg from linux. In the internet i found only that you can grab only desktop screen or region from desktop screen but not specific aaplication window.

@koreanfan
Copy link

i try your code
With webcam overlay and logo:

This will place your webcam overlay in the top right, and a logo in the bottom left:

ffmpeg -loglevel debug
-threads:v 2 -threads:a 8 -filter_threads 2
-thread_queue_size 512 -f x11grab -s 1920x1080 -framerate 60 -i :0.0
-f v4l2 -video_size 320x240 -framerate 30 -i /dev/video0
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=i965 -filter_hw_device va
-i logo.png -filter_complex
"[0:v]format=nv12,hwupload,scale_vaapi=1024:-1,setpts=PTS-STARTPTS[bg];
[1:v]format=nv12,hwupload,scale_vaapi=120:-1,setpts=PTS-STARTPTS[fg];
[bg][fg]overlay=W-w-10:10[bg2];
[bg2][3:v]overlay=W-w-10:H-h-10,format=yuv420p[out]"
-map "[out]" -map 2:a
-thread_queue_size 512 -f alsa -ac 2 -i hw:0,0
-bsf:a aac_adtstoasc -c:a aac -ac 2 -b:a 128k
-c:v h264_vaapi -b:v 6000k -minrate:v 6000k -maxrate:v 6000k -bufsize:v 6000k
-r:v 60 -g:v 120 -bf:v 3 -refs:v 16
-f flv rtmp://live.twitch.tv/app/

But it doesnt work. I cant figure out proper code for transition from soft to hardware accel vaapi for
ffmpeg -y -f x11grab -video_size 1920x1080 -framerate 30 -i $DISPLAY
-f pulse -ac 2 -i default -i logo.png -i screenlogo.png -filter_complex
"[0:v]scale=1280:-1,setpts=PTS-STARTPTS[bg];[2:v]scale=162:-1,setpts=PTS-STARTPTS[bg2];[3:v]scale=120:-1,setpts=PTS-STARTPTS[bg3];
[bg][bg2]overlay=0:H-h[bg4];[bg4][bg3]overlay=W-w:0,format=yuv420p[v]"
-map "[v]" -map 1:a -c:v libx264 -g 60 -preset ultrafast
-b:v 3M -maxrate 3M -c:a aac -b:a 160k -ar 44100 -b:a 128k
-f flv x264.flv

@Brainiarc7
Copy link
Author

Brainiarc7 commented Jun 15, 2020 via email

@koreanfan
Copy link

Ok

@koreanfan
Copy link

Also i have mouse accel issue when i use vaapi encoder with obs or ffmpeg for stream/record dota 2 with opengl renderer. I create topics:
daniel-schuermann/mesa#184
ValveSoftware/Dota-2#1770
http://ffmpeg.org/pipermail/ffmpeg-user/2020-June/048983.html
I think this is video driver or x11 bug ,but other said this is dota 2 bug.

@koreanfan
Copy link

@simpletvlc
Copy link

ffmpeg -re -i https://radio.radyotvonline.net/radio/playlist.m3u8 -stream_loop -1 -i /home/media/video.mov -c:v libx264 -pix_fmt yuv420p -preset ultrafast -c:a mp3 -ab 128k -flvflags no_duration_filesize -f flv rtmp://127.0.0.1:25462/live/stream

I add mov on the radio url and stream it. How can I do this completely with the gpu server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment