-
-
Save squadbox/e5b5f7bcd86259d627ed to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# cool_gpu2.sh This script will enable or disable fixed gpu fan speed | |
# | |
# Description: A script to control GPU fan speed on headless (non-X) linux nodes | |
# Original Script by Axel Kohlmeyer <[email protected]> | |
# https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness | |
# | |
# Modified for newer drivers and removed old work-arounds | |
# Tested on Ubuntu 14.04 with driver 352.41 | |
# Copyright 2015, squadbox | |
# Requirements: | |
# * An Nvidia GPU | |
# * Nvidia Driver V285 or later | |
# * xorg | |
# * Coolbits enabled and empty config setting | |
# nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration | |
# You may have to run this as root or with sudo if the current user is not authorized to start X sessions. | |
# Paths to the utilities we will need | |
SMI='/usr/bin/nvidia-smi' | |
SET='/usr/bin/nvidia-settings' | |
# Determine major driver version | |
VER=`awk '/NVIDIA/ {print $8}' /proc/driver/nvidia/version | cut -d . -f 1` | |
# Drivers from 285.x.y on allow persistence mode setting | |
if [ ${VER} -lt 285 ] | |
then | |
echo "Error: Current driver version is ${VER}. Driver version must be greater than 285."; exit 1; | |
fi | |
# Read a numerical command line arg between 40 and 100 | |
if [ "$1" -eq "$1" ] 2>/dev/null && [ "0$1" -ge "40" ] && [ "0$1" -le "100" ] | |
then | |
$SMI -pm 1 # enable persistance mode | |
speed=$1 # set speed | |
echo "Setting fan to $speed%." | |
# how many GPU's are in the system? | |
NUMGPU="$(nvidia-smi -L | wc -l)" | |
# loop through each GPU and individually set fan speed | |
n=0 | |
while [ $n -lt $NUMGPU ]; | |
do | |
# start an x session, and call nvidia-settings to enable fan control and set speed | |
xinit ${SET} -a [gpu:${n}]/GPUFanControlState=1 -a [fan:${n}]/GPUTargetFanSpeed=$speed -- :0 -once | |
let n=n+1 | |
done | |
echo "Complete"; exit 0; | |
elif [ "x$1" = "xstop" ] | |
then | |
$SMI -pm 0 # disable persistance mode | |
echo "Enabling default auto fan control." | |
# how many GPU's are in the system? | |
NUMGPU="$(nvidia-smi -L | wc -l)" | |
# loop through each GPU and individually set fan speed | |
n=0 | |
while [ $n -lt $NUMGPU ]; | |
do | |
# start an x session, and call nvidia-settings to enable fan control and set speed | |
xinit ${SET} -a [gpu:${n}]/GPUFanControlState=0 -- :0 -once | |
let n=n+1 | |
done | |
echo "Complete"; exit 0; | |
else | |
echo "Error: Please pick a fan speed between 40 and 100, or stop."; exit 1; | |
fi |
Same goes here.
Same here.
This seems to be working great to set fan speed on three 1070's, but causing the second and third cards to slow down drastically. When mining they are at about 10% of their normal rate, is this due to them still being attached to a screen?
Hello,
first I want to thank you for sharing this script.
Works great, but unfortunately only for my first GPU.
For all other GPU's I get the following error:
ERROR: Error assigning value 100 to attribute 'GPUTargetFanSpeed'
(hostname:0[fan:1]) as specified in assignment
'[fan:1]/GPUTargetFanSpeed=100' (Unknown Error).
Would be really nice if you could have a look into it.
Thanks in advance!
As a workaround for the error setting fan speed to GPU 1 or 2 with the error, you can try this:
nvidia-xconfig -s -a --force-generate --allow-empty-initial-configuration --cool-bits=12 --registry-dwords="PerfLevelSrc=0x2222" --no-sli --connected-monitor="DFP-0"
Then it worked on my RIG.
@raoulh lifesaver. same here. Thanks.
has anyone found a solution for the errors?
ERROR: Error querying enabled displays on GPU 0 (Missing Extension).
ERROR: Error querying connected displays on GPU 0 (Missing Extension).
ERROR: Error resolving target specification 'gpu:0' (No targets match target
specification), specified in assignment '[gpu:0]/GPUFanControlState=1'.
@wlara try running:
export DISPLAY=:0.0
I'm attempting to use this on a Ubuntu server install and it does work, after installing xinit and related packages, but after X is killed, the GPUs become stuck in low power state P8 which is essentially idle. This doesn't occur if I install and run lightdm so that an X instance stays running on each of the GPUs though. Any thoughts?
I know it works this way but it seems like blasphemy to have to install lightdm on a headless machine.
I finally solved my problem. Previously I got the error
ERROR: Error assigning value 100 to attribute 'GPUTargetFanSpeed'
(hostname:0[fan:1]) as specified in assignment
'[fan:1]/GPUTargetFanSpeed=100' (Unknown Error).
In my case (Titan RTX), each GPU has two individually tunable fans! So fan:0 and fan:1 have to be set with gpu:0 and fan:2, fan:3 with gpu:1.
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=100 -a [fan:1]/GPUTargetFanSpeed=100 -c :0
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=100 -a [fan:3]/GPUTargetFanSpeed=100 -c :0
Hope it helps!
I've adapted the work this is based on into a pip-installable Python script
Hi,
Thanks for this hack. However, I get the following errors (for each gpu):
ERROR: Error querying enabled displays on GPU 0 (Missing Extension).
ERROR: Error querying connected displays on GPU 0 (Missing Extension).
ERROR: Error resolving target specification 'gpu:1' (No targets match target specification), specified in assignment '[gpu:3]/GPUFanControlState=1'.
I am using the 375.26 nvidia drivers on a headless tower. If you have any ideas on how to solve this it'd be great.
Thanks!