Skip to content

Instantly share code, notes, and snippets.

@squadbox
Last active October 20, 2024 09:32
Show Gist options
  • Save squadbox/e5b5f7bcd86259d627ed to your computer and use it in GitHub Desktop.
Save squadbox/e5b5f7bcd86259d627ed to your computer and use it in GitHub Desktop.
A script to control Nvidia GPU fan speed on headless (non-X) linux nodes
#!/bin/bash
# cool_gpu2.sh This script will enable or disable fixed gpu fan speed
#
# Description: A script to control GPU fan speed on headless (non-X) linux nodes
# Original Script by Axel Kohlmeyer <[email protected]>
# https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness
#
# Modified for newer drivers and removed old work-arounds
# Tested on Ubuntu 14.04 with driver 352.41
# Copyright 2015, squadbox
# Requirements:
# * An Nvidia GPU
# * Nvidia Driver V285 or later
# * xorg
# * Coolbits enabled and empty config setting
# nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration
# You may have to run this as root or with sudo if the current user is not authorized to start X sessions.
# Paths to the utilities we will need
SMI='/usr/bin/nvidia-smi'
SET='/usr/bin/nvidia-settings'
# Determine major driver version
VER=`awk '/NVIDIA/ {print $8}' /proc/driver/nvidia/version | cut -d . -f 1`
# Drivers from 285.x.y on allow persistence mode setting
if [ ${VER} -lt 285 ]
then
echo "Error: Current driver version is ${VER}. Driver version must be greater than 285."; exit 1;
fi
# Read a numerical command line arg between 40 and 100
if [ "$1" -eq "$1" ] 2>/dev/null && [ "0$1" -ge "40" ] && [ "0$1" -le "100" ]
then
$SMI -pm 1 # enable persistance mode
speed=$1 # set speed
echo "Setting fan to $speed%."
# how many GPU's are in the system?
NUMGPU="$(nvidia-smi -L | wc -l)"
# loop through each GPU and individually set fan speed
n=0
while [ $n -lt $NUMGPU ];
do
# start an x session, and call nvidia-settings to enable fan control and set speed
xinit ${SET} -a [gpu:${n}]/GPUFanControlState=1 -a [fan:${n}]/GPUTargetFanSpeed=$speed -- :0 -once
let n=n+1
done
echo "Complete"; exit 0;
elif [ "x$1" = "xstop" ]
then
$SMI -pm 0 # disable persistance mode
echo "Enabling default auto fan control."
# how many GPU's are in the system?
NUMGPU="$(nvidia-smi -L | wc -l)"
# loop through each GPU and individually set fan speed
n=0
while [ $n -lt $NUMGPU ];
do
# start an x session, and call nvidia-settings to enable fan control and set speed
xinit ${SET} -a [gpu:${n}]/GPUFanControlState=0 -- :0 -once
let n=n+1
done
echo "Complete"; exit 0;
else
echo "Error: Please pick a fan speed between 40 and 100, or stop."; exit 1;
fi
@Macrum
Copy link

Macrum commented Jul 1, 2017

Hello,
first I want to thank you for sharing this script.
Works great, but unfortunately only for my first GPU.

For all other GPU's I get the following error:

ERROR: Error assigning value 100 to attribute 'GPUTargetFanSpeed'
       (hostname:0[fan:1]) as specified in assignment
       '[fan:1]/GPUTargetFanSpeed=100' (Unknown Error).

Would be really nice if you could have a look into it.

Thanks in advance!

@raoulh
Copy link

raoulh commented Oct 30, 2017

As a workaround for the error setting fan speed to GPU 1 or 2 with the error, you can try this:

nvidia-xconfig -s -a --force-generate --allow-empty-initial-configuration --cool-bits=12 --registry-dwords="PerfLevelSrc=0x2222" --no-sli --connected-monitor="DFP-0"

Then it worked on my RIG.

@khavernathy
Copy link

@raoulh lifesaver. same here. Thanks.

@wlara
Copy link

wlara commented Mar 18, 2018

has anyone found a solution for the errors?

ERROR: Error querying enabled displays on GPU 0 (Missing Extension).
ERROR: Error querying connected displays on GPU 0 (Missing Extension).
ERROR: Error resolving target specification 'gpu:0' (No targets match target
       specification), specified in assignment '[gpu:0]/GPUFanControlState=1'.

@streslab
Copy link

@wlara try running:
export DISPLAY=:0.0

@tinfever
Copy link

I'm attempting to use this on a Ubuntu server install and it does work, after installing xinit and related packages, but after X is killed, the GPUs become stuck in low power state P8 which is essentially idle. This doesn't occur if I install and run lightdm so that an X instance stays running on each of the GPUs though. Any thoughts?

I know it works this way but it seems like blasphemy to have to install lightdm on a headless machine.

@isarandi
Copy link

I finally solved my problem. Previously I got the error

ERROR: Error assigning value 100 to attribute 'GPUTargetFanSpeed'
       (hostname:0[fan:1]) as specified in assignment
       '[fan:1]/GPUTargetFanSpeed=100' (Unknown Error).

In my case (Titan RTX), each GPU has two individually tunable fans! So fan:0 and fan:1 have to be set with gpu:0 and fan:2, fan:3 with gpu:1.

nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=100 -a [fan:1]/GPUTargetFanSpeed=100 -c :0
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=100 -a [fan:3]/GPUTargetFanSpeed=100 -c :0

Hope it helps!

@andyljones
Copy link

I've adapted the work this is based on into a pip-installable Python script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment