-
-
Save squadbox/e5b5f7bcd86259d627ed to your computer and use it in GitHub Desktop.
| #!/bin/bash | |
| # cool_gpu2.sh This script will enable or disable fixed gpu fan speed | |
| # | |
| # Description: A script to control GPU fan speed on headless (non-X) linux nodes | |
| # Original Script by Axel Kohlmeyer <[email protected]> | |
| # https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness | |
| # | |
| # Modified for newer drivers and removed old work-arounds | |
| # Tested on Ubuntu 14.04 with driver 352.41 | |
| # Copyright 2015, squadbox | |
| # Requirements: | |
| # * An Nvidia GPU | |
| # * Nvidia Driver V285 or later | |
| # * xorg | |
| # * Coolbits enabled and empty config setting | |
| # nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration | |
| # You may have to run this as root or with sudo if the current user is not authorized to start X sessions. | |
| # Paths to the utilities we will need | |
| SMI='/usr/bin/nvidia-smi' | |
| SET='/usr/bin/nvidia-settings' | |
| # Determine major driver version | |
| VER=`awk '/NVIDIA/ {print $8}' /proc/driver/nvidia/version | cut -d . -f 1` | |
| # Drivers from 285.x.y on allow persistence mode setting | |
| if [ ${VER} -lt 285 ] | |
| then | |
| echo "Error: Current driver version is ${VER}. Driver version must be greater than 285."; exit 1; | |
| fi | |
| # Read a numerical command line arg between 40 and 100 | |
| if [ "$1" -eq "$1" ] 2>/dev/null && [ "0$1" -ge "40" ] && [ "0$1" -le "100" ] | |
| then | |
| $SMI -pm 1 # enable persistance mode | |
| speed=$1 # set speed | |
| echo "Setting fan to $speed%." | |
| # how many GPU's are in the system? | |
| NUMGPU="$(nvidia-smi -L | wc -l)" | |
| # loop through each GPU and individually set fan speed | |
| n=0 | |
| while [ $n -lt $NUMGPU ]; | |
| do | |
| # start an x session, and call nvidia-settings to enable fan control and set speed | |
| xinit ${SET} -a [gpu:${n}]/GPUFanControlState=1 -a [fan:${n}]/GPUTargetFanSpeed=$speed -- :0 -once | |
| let n=n+1 | |
| done | |
| echo "Complete"; exit 0; | |
| elif [ "x$1" = "xstop" ] | |
| then | |
| $SMI -pm 0 # disable persistance mode | |
| echo "Enabling default auto fan control." | |
| # how many GPU's are in the system? | |
| NUMGPU="$(nvidia-smi -L | wc -l)" | |
| # loop through each GPU and individually set fan speed | |
| n=0 | |
| while [ $n -lt $NUMGPU ]; | |
| do | |
| # start an x session, and call nvidia-settings to enable fan control and set speed | |
| xinit ${SET} -a [gpu:${n}]/GPUFanControlState=0 -- :0 -once | |
| let n=n+1 | |
| done | |
| echo "Complete"; exit 0; | |
| else | |
| echo "Error: Please pick a fan speed between 40 and 100, or stop."; exit 1; | |
| fi |
has anyone found a solution for the errors?
ERROR: Error querying enabled displays on GPU 0 (Missing Extension).
ERROR: Error querying connected displays on GPU 0 (Missing Extension).
ERROR: Error resolving target specification 'gpu:0' (No targets match target
specification), specified in assignment '[gpu:0]/GPUFanControlState=1'.
@wlara try running:
export DISPLAY=:0.0
I'm attempting to use this on a Ubuntu server install and it does work, after installing xinit and related packages, but after X is killed, the GPUs become stuck in low power state P8 which is essentially idle. This doesn't occur if I install and run lightdm so that an X instance stays running on each of the GPUs though. Any thoughts?
I know it works this way but it seems like blasphemy to have to install lightdm on a headless machine.
I finally solved my problem. Previously I got the error
ERROR: Error assigning value 100 to attribute 'GPUTargetFanSpeed'
(hostname:0[fan:1]) as specified in assignment
'[fan:1]/GPUTargetFanSpeed=100' (Unknown Error).
In my case (Titan RTX), each GPU has two individually tunable fans! So fan:0 and fan:1 have to be set with gpu:0 and fan:2, fan:3 with gpu:1.
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=100 -a [fan:1]/GPUTargetFanSpeed=100 -c :0
nvidia-settings -a [gpu:1]/GPUFanControlState=1 -a [fan:2]/GPUTargetFanSpeed=100 -a [fan:3]/GPUTargetFanSpeed=100 -c :0
Hope it helps!
I've adapted the work this is based on into a pip-installable Python script
@raoulh lifesaver. same here. Thanks.