Skip to content

Instantly share code, notes, and snippets.

@Fweeb
Last active October 6, 2024 14:31
Show Gist options
  • Save Fweeb/1aca99268d150b60d5d43c8c62a53f58 to your computer and use it in GitHub Desktop.
Save Fweeb/1aca99268d150b60d5d43c8c62a53f58 to your computer and use it in GitHub Desktop.
Rendering in Blender on a machine with multiple GPUs

Rendering in Blender on a machine with multiple GPUs

So here's the premise: For scenes that take around a minute or less to render, performance is actually worse if you render on all of the cards with a single instance of Blender. This is because (AFAIK) there's a bit of additional time necessary to collect the render results from each card and stitch them together. That time is a fixed short duration, so it's negligible on larger/longer render jobs. However, on shorter render jobs, the 'stitch time' has a much more significant impact.

I ran into this with a machine I render on that has 4 Quadro K6000s in it. To render animations, I ended up writing a few little scripts to facilitate launching 4 separate instances of Blender, each one tied to one GPU. Overall rendertime was much shorter with that setup than one instance of Blender using all 4 GPUs.

The setup works basically like this... I have a the following Python script (it can be anywhere on your hard drive, so long as you remember the path to it). I call it batch_cycles.py:

import sys
import bpy

#XXX Assumes sys.argv[1] is the CUDA device

argv = sys.argv[sys.argv.index("--") + 1:]

if len(argv) > 0:
    
    dev = int(argv[0])
    C = bpy.context
    cycles_prefs = C.user_preferences.addons['cycles'].preferences
    
    C.scene.render.use_overwrite = False
    C.scene.render.use_placeholder = True
    cycles_prefs.compute_device_type = "CUDA"

    for device in cycles_prefs.devices:
        device.use = False
    
    cycles_prefs.devices[dev].use = True

This script takes a single compute device (an integer number) as an input argument (argv). The script disables all compute devices except for the one specified. By doing that, only one GPU is used for this running instance of Blender. The use_overwrite and use_placeholder settings are also critical to have in there. Otherwise, all of my Blender instances will be rendering the same frames.

With the batch_cycles.py script created and accessible, I just need a script to launch 4 instances of Blender. As I'm in Linux, I did it in bash. However, you could just as easily write a Windows BAT script or even a Python script. The key is that you need a mechanism that lauches Blender and lets that instance run in the background while you launch another instance of Blender. My bash script looks like this:

#!/bin/bash

if [ $# = 0 ]; then
    echo "You forgot to mention the .blend file you want to render"
else
    for i in `seq 0 3`;
    do
        blender -b $1 --python /path/to/batch_cycles.py -a -- $i &
    done
fi

In short, this script launches 4 instances of Blender (seq 0 3), and each instance uses a different CUDA device. The ampersand (&) at the end is what lets each instance of Blender run in the background. I'm not sure what the correct flag is in Windows. It does exist, though. I'd looked it up at one point, but have since forgotten it.

And that's basically how I do it. Now, I know that this could certainly stand to be refined. In the meantime, however, this certainly does the job for me.

@nitheeshas
Copy link

Totally agree to this. I had the same revelation while working with blender. BTW, you can just pass the ID for CUDA_x to the render script, instead of creating 4 different copies of the same file. It gets pretty messy when working with bigger render scripts.

@Fweeb
Copy link
Author

Fweeb commented Mar 13, 2017

@nitheeshas Very true. Also, the API has changed since I wrote this gist. Let me see about updating it...

annnnnd... updated :)

@bholzer
Copy link

bholzer commented Mar 2, 2018

I am handling this differently for ease of use on machines with different GPU counts, using yours as inspiration. Figured I'd post my solution:

render.sh

#!/bin/bash
# must be run with sudo to configure GPUs

# configure the GPU settings to be persistent
nvidia-smi -pm 1
# disable the autoboost feature for all GPUs on the instance
nvidia-smi --auto-boost-default=0
# set all GPU clock speeds to their maximum frequency
nvidia-smi -ac 2505,875

blend_file=$1
start_frame=$2
end_frame=$3
gpu_count=$4
interval=$(( (end_frame-start_frame)/gpu_count ))

for ((i=0;i<gpu_count;i++))
do
  # Have each gpu render a set of frames
  local_start=$(( start_frame+(i*interval)+i ))
  local_end=$(( local_start + interval ))

  if [ $local_end -gt $end_frame ]; then
    local_end=$end_frame
  fi

  blender -b $blend_file \
    -noaudio -nojoystick --use-extension 1 \
    -E CYCLES \
    -t 0 \
    -s $local_start \
    -e $local_end \
    -P batch_cycles.py -a -- $i &
done

@sbrl
Copy link

sbrl commented Mar 3, 2022

For Nvidia GPUs, see also the CUDA_VISIBLE_DEVICES environment variable, which takes a comma-separated of integers that specify the GPUs which should be made visible to the running application. Said integers are the indexes you can find in the nvidia-smi command.

@robomartion
Copy link

Using this could you basically combine the VRAM of multiple cards without NVLink? So a 3840x2160 frame of a scene that requires 48GB of VRAM is split into four 960x2160 frames, 12GB each allowing for four RTX 3080Tis to render it as if they were two RTX 3090s?

@Fweeb
Copy link
Author

Fweeb commented Apr 1, 2022

@robomartion I haven't tested for that particular use-case, but hypothetically it could be used that way. Final render resolution isn't necessarily the only determiner for how much VRAM a scene uses. Objects off-camera contribute to lighting, so the size of the assembled BVH is likely to play a part as well (and may not actually be smaller if you render just a crop of your intended final image).

@tanjary21
Copy link

hi! this is a great piece of scripting!

Now, I understood how to use this to render animations on multiple GPUs and I understood how the workload is distributed to gain increased parallelization.

However, I'm also looking for a way to distribute the rendering of a single frame across multiple GPUs. Has anyone found a way to do that? I guess it has to do with specifying which tiles get rendered on which device, rather than specifying which frames get rendered on which device. I haven't been able to find the appropriate command line argument for it

@sbrl
Copy link

sbrl commented Nov 12, 2023

@tanjary21: This gist points out that for most scenes, rendering a single frame over multiple GPUs with a single blender instance will actually take longer than it would otherwise, due to additional setup/teardown.

However, if you go into the system tab of the settings/preferences, you should be able to tick the box to get Blender to use multiple GPUs in the Blender GUI.

Not sure on a command-line alternative though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment