So here's the premise: For scenes that take around a minute or less to render, performance is actually worse if you render on all of the cards with a single instance of Blender. This is because (AFAIK) there's a bit of additional time necessary to collect the render results from each card and stitch them together. That time is a fixed short duration, so it's negligible on larger/longer render jobs. However, on shorter render jobs, the 'stitch time' has a much more significant impact.
I ran into this with a machine I render on that has 4 Quadro K6000s in it. To render animations, I ended up writing a few little scripts to facilitate launching 4 separate instances of Blender, each one tied to one GPU. Overall rendertime was much shorter with that setup than one instance of Blender using all 4 GPUs.
The setup works basically like this... I have a the following Python script (it can be anywhere on your hard drive, so long as you remember the path to it). I call it batch_cycles.py
:
import sys
import bpy
#XXX Assumes sys.argv[1] is the CUDA device
argv = sys.argv[sys.argv.index("--") + 1:]
if len(argv) > 0:
dev = int(argv[0])
C = bpy.context
cycles_prefs = C.user_preferences.addons['cycles'].preferences
C.scene.render.use_overwrite = False
C.scene.render.use_placeholder = True
cycles_prefs.compute_device_type = "CUDA"
for device in cycles_prefs.devices:
device.use = False
cycles_prefs.devices[dev].use = True
This script takes a single compute device (an integer number) as an input argument (argv
). The script disables all compute devices except for the one specified. By doing that, only one GPU is used for this running instance of Blender. The use_overwrite
and use_placeholder
settings are also critical to have in there. Otherwise, all of my Blender instances will be rendering the same frames.
With the batch_cycles.py
script created and accessible, I just need a script to launch 4 instances of Blender. As I'm in Linux, I did it in bash. However, you could just as easily write a Windows BAT
script or even a Python script. The key is that you need a mechanism that lauches Blender and lets that instance run in the background while you launch another instance of Blender. My bash script looks like this:
#!/bin/bash
if [ $# = 0 ]; then
echo "You forgot to mention the .blend file you want to render"
else
for i in `seq 0 3`;
do
blender -b $1 --python /path/to/batch_cycles.py -a -- $i &
done
fi
In short, this script launches 4 instances of Blender (seq 0 3
), and each instance uses a different CUDA device. The ampersand (&
) at the end is what lets each instance of Blender run in the background. I'm not sure what the correct flag is in Windows. It does exist, though. I'd looked it up at one point, but have since forgotten it.
And that's basically how I do it. Now, I know that this could certainly stand to be refined. In the meantime, however, this certainly does the job for me.
hi! this is a great piece of scripting!
Now, I understood how to use this to render animations on multiple GPUs and I understood how the workload is distributed to gain increased parallelization.
However, I'm also looking for a way to distribute the rendering of a single frame across multiple GPUs. Has anyone found a way to do that? I guess it has to do with specifying which tiles get rendered on which device, rather than specifying which frames get rendered on which device. I haven't been able to find the appropriate command line argument for it