Skip to content

Instantly share code, notes, and snippets.

@Brainiarc7
Last active November 11, 2024 15:31
Show Gist options
  • Save Brainiarc7/2afac8aea75f4e01d7670bc2ff1afad1 to your computer and use it in GitHub Desktop.
Save Brainiarc7/2afac8aea75f4e01d7670bc2ff1afad1 to your computer and use it in GitHub Desktop.
This gist will show you how to launch multiple ffmpeg instances with xargs, very useful for NVIDIA NVENC based encoding where standard GPUs limit the maximum simultaneous encode sessions to two.

Spawning multiple ffmpeg processes with xargs:

On standard NVIDIA GPUs (Not the Quadros and Tesla lines), NVENC encodes are limited to two simultaneous sessions. The sample below illustrates how to pass a list of AVI files to ffmpeg and encode them to HEVC on two encode sessions:

$ find Videos/ -type f -name \*.avi -print | sed 's/.avi$//' |\
  xargs -n 1 -I@ -P 2 ffmpeg -i "@.avi" -c:a aac -c:v hevc_nvenc "@.mp4"

This will find all files with the ending .avi in the directory Videos/ and transcode them into HEVC/H265+AAC files with the ending .mp4. The noteworthy part here is the -P 2 to xargs, which starts up to two processes in parallel.

You may modify the command above to specify any other file wildcards to suit your case.

Note that you can also combine the encode process with CUDA's accelerated NPP for faster image scaling on top of it. So instead of doing this:

ffmpeg -i input.avi \
      -c:a aac -b:a 128k \
      -s hd480 \
      -c:v hevc_nvenc -b:v 1024k \
      -y output.mp4

You can now replace the "-s hd480" with a (somewhat complicated) filter and do:

ffmpeg -i input.avi \
      -c:a aac -b:a 128k \
      -filter:v hwupload_cuda,scale_npp=w=852:h=480:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
      -c:v hevc_nvenc -b:v 1024k \
      -y output.mp4

This uploads each frame to the CUDA engine where it then scales these with the lanczos algorithm to a size of 852x480 pixels (same as hd480). The format "nv12" as well as "yupv420p" can be used here and needs to be specified in order for it to work. "nv12" seems to run a bit faster than "yuv420p" but both formats are recognized by the NVENC encoder.

Now, running two ffmpeg processes in parallel and using CUDA based NPP (Nvidia Performance Primitives extensions, see this for the build guide) for scaling, you can rapidly encode video(s) using an NVENC capable GPU.

find Videos/ -type f -name \*.avi -print0 | sed 's/.avi$//' | xargs -0 -n 1 -I@ -P 2 \
   ffmpeg -i "@.avi" \
      -c:a aac -b:a 128k \
      -filter:v hwupload_cuda,scale_npp=w=852:h=480:format=nv12:interp_algo=lanczos,hwdownload,format=nv12 \
      -c:v hevc_nvenc -b:v 1024k \
      -y "@.mp4"

Of course, if you have a Tesla or Quadro GPU, it's possible to launch more than two simultaneous encode sessions, as your need may arise. Merely modify the -P $(n) argument passed to xargs, where $(n) is the number of simultaneous ffmpeg processes.

Have fun.

@oguya
Copy link

oguya commented Dec 8, 2016

Good use of xargs 👍 . I've used it before to parallelize rsync.

@Brainiarc7
Copy link
Author

xargs is a life-saver, yo.

@Tinkery
Copy link

Tinkery commented Oct 13, 2017

Looking for a little guidance on ffmpeg. The post is awesome by the way, and I've bene looking for something like this...2 questions I have for anyone who will take the time.

  1. I'm using a massive server with 512 GB of RAM. Would tapping in to the video card really help with transcoding? I don't really have a great video card, yet. It's just a NVIDIA GeForce GT 720. Plan to try and get a couple of beefier ones in there though, if it'll help.

  2. Can this be done with just I'm trying to write a line that allows me to launch multiple instances of ffmpeg for transcoding? It's simple transcoding (I was transcoding 1GB AVC files to MP4, and resizing at the same time.). I'd like to use the CPU power (4-10 cores processors) to get more done simultaneously, and not consecutively. I'm not a scripting guru...and this post was the first I heard of xargs, to just set the expectations. I did create 10 separate batch files and kicked those off, but I'm sure there is a better way. CPU almost maxed out but RAM wasn't touched hardly at all. Just trying to see what I can do to improve performance, i.e. speed. I have almost 4000 files to convert.

Still learning where all the ffmpeg transcoding, etc., is done (CPU, v. RAM, etc.), and what's more important to tackle. Thanks, and I'm open to all ideas.

@Brainiarc7
Copy link
Author

Hello @Tinkery,

You can reach me through email (it's on my Github profile) for further discussion.

@lojasyst
Copy link

lojasyst commented Mar 2, 2018

Hello
Brainiarc7, this concept would work with live streams?

I do have a GTX1070 nvidia card.

@Brainiarc7
Copy link
Author

Yes, it would.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment