Skip to content

Instantly share code, notes, and snippets.

@squadbox
Last active October 20, 2024 09:32
Show Gist options
  • Save squadbox/e5b5f7bcd86259d627ed to your computer and use it in GitHub Desktop.
Save squadbox/e5b5f7bcd86259d627ed to your computer and use it in GitHub Desktop.
A script to control Nvidia GPU fan speed on headless (non-X) linux nodes
#!/bin/bash
# cool_gpu2.sh This script will enable or disable fixed gpu fan speed
#
# Description: A script to control GPU fan speed on headless (non-X) linux nodes
# Original Script by Axel Kohlmeyer <[email protected]>
# https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness
#
# Modified for newer drivers and removed old work-arounds
# Tested on Ubuntu 14.04 with driver 352.41
# Copyright 2015, squadbox
# Requirements:
# * An Nvidia GPU
# * Nvidia Driver V285 or later
# * xorg
# * Coolbits enabled and empty config setting
# nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration
# You may have to run this as root or with sudo if the current user is not authorized to start X sessions.
# Paths to the utilities we will need
SMI='/usr/bin/nvidia-smi'
SET='/usr/bin/nvidia-settings'
# Determine major driver version
VER=`awk '/NVIDIA/ {print $8}' /proc/driver/nvidia/version | cut -d . -f 1`
# Drivers from 285.x.y on allow persistence mode setting
if [ ${VER} -lt 285 ]
then
echo "Error: Current driver version is ${VER}. Driver version must be greater than 285."; exit 1;
fi
# Read a numerical command line arg between 40 and 100
if [ "$1" -eq "$1" ] 2>/dev/null && [ "0$1" -ge "40" ] && [ "0$1" -le "100" ]
then
$SMI -pm 1 # enable persistance mode
speed=$1 # set speed
echo "Setting fan to $speed%."
# how many GPU's are in the system?
NUMGPU="$(nvidia-smi -L | wc -l)"
# loop through each GPU and individually set fan speed
n=0
while [ $n -lt $NUMGPU ];
do
# start an x session, and call nvidia-settings to enable fan control and set speed
xinit ${SET} -a [gpu:${n}]/GPUFanControlState=1 -a [fan:${n}]/GPUTargetFanSpeed=$speed -- :0 -once
let n=n+1
done
echo "Complete"; exit 0;
elif [ "x$1" = "xstop" ]
then
$SMI -pm 0 # disable persistance mode
echo "Enabling default auto fan control."
# how many GPU's are in the system?
NUMGPU="$(nvidia-smi -L | wc -l)"
# loop through each GPU and individually set fan speed
n=0
while [ $n -lt $NUMGPU ];
do
# start an x session, and call nvidia-settings to enable fan control and set speed
xinit ${SET} -a [gpu:${n}]/GPUFanControlState=0 -- :0 -once
let n=n+1
done
echo "Complete"; exit 0;
else
echo "Error: Please pick a fan speed between 40 and 100, or stop."; exit 1;
fi
@mattics
Copy link

mattics commented Jun 30, 2017

This seems to be working great to set fan speed on three 1070's, but causing the second and third cards to slow down drastically. When mining they are at about 10% of their normal rate, is this due to them still being attached to a screen?

@Macrum
Copy link

Macrum commented Jul 1, 2017

Hello,
first I want to thank you for sharing this script.
Works great, but unfortunately only for my first GPU.

For all other GPU's I get the following error:

ERROR: Error assigning value 100 to attribute 'GPUTargetFanSpeed'
       (hostname:0[fan:1]) as specified in assignment
       '[fan:1]/GPUTargetFanSpeed=100' (Unknown Error).

Would be really nice if you could have a look into it.

Thanks in advance!

@raoulh
Copy link

raoulh commented Oct 30, 2017

As a workaround for the error setting fan speed to GPU 1 or 2 with the error, you can try this:

nvidia-xconfig -s -a --force-generate --allow-empty-initial-configuration --cool-bits=12 --registry-dwords="PerfLevelSrc=0x2222" --no-sli --connected-monitor="DFP-0"

Then it worked on my RIG.

@khavernathy
Copy link

@raoulh lifesaver. same here. Thanks.

@wlara
Copy link

wlara commented Mar 18, 2018

has anyone found a solution for the errors?

ERROR: Error querying enabled displays on GPU 0 (Missing Extension).
ERROR: Error querying connected displays on GPU 0 (Missing Extension).
ERROR: Error resolving target specification 'gpu:0' (No targets match target
       specification), specified in assignment '[gpu:0]/GPUFanControlState=1'.

@streslab
Copy link

@wlara try running:
export DISPLAY=:0.0

@tinfever
Copy link

I'm attempting to use this on a Ubuntu server install and it does work, after installing xinit and related packages, but after X is killed, the GPUs become stuck in low power state P8 which is essentially idle. This doesn't occur if I install and run lightdm so that an X instance stays running on each of the GPUs though. Any thoughts?

I know it works this way but it seems like blasphemy to have to install lightdm on a headless machine.

@isarandi
Copy link

I finally solved my problem. Previously I got the error

ERROR: Error assigning value 100 to attribute 'GPUTargetFanSpeed'
       (hostname:0[fan:1]) as specified in assignment
       '[fan:1]/GPUTargetFanSpeed=100' (Unknown Error).

In my case (Titan RTX), each GPU has two individually tunable fans! So fan:0 and fan:1 have to be set with gpu:0 and fan:2, fan:3 with gpu:1.

nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=100 -a [fan:1]/GPUTargetFanSpeed=100 -c :0
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=100 -a [fan:3]/GPUTargetFanSpeed=100 -c :0

Hope it helps!

@andyljones
Copy link

I've adapted the work this is based on into a pip-installable Python script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment