A GPU card can be configured in one of two modes: vSGA (shared virtual graphics) and vGPU. The NVIDIA card should be configured with vGPU mode. This is specifically for use of the GPU in compute workloads, such as in machine learning or high performance computing applications. To enable vGPU mode on the ESXi host, use the command line to execute this command:
esxcli graphics host set --default-type SharedPassthru
Check the Host Graphics Settings
[root@esxi-tesla:~] esxcli graphics host get
Default Graphics Type: SharedPassthru
Shared Passthru Assignment Policy: Performance
The NVIDIA Virtual GPU Manager for VMware vSphere ESXi is distributed as a vSphere Installation Bundle (VIB) file. You can download here
Copy the NVIDIA Virtual GPU Manager VIB file to the ESXi host (using SCP)
Put the ESXi host into maintenance mode.
$ esxcli system maintenanceMode set -e true
Run the esxcli command to install the NVIDIA Virtual GPU Manager from the VIB file.
$ esxcli software vib install –v directory/NVIDIA**.vib
directory is the path to the directory that contains the VIB file.
Exit maintenance mode.
$ esxcli system maintenanceMode set –e false
Reboot the ESXi host.
$ reboot
Verify that the NVIDIA kernel driver can successfully communicate with the physical GPUs in your system by running the nvidia-smi command without any options.
$ nvidia-smi
If successful, the nvidia-smi command lists all the GPUs in your system.
[root@esxi-tesla:~] nvidia-smi
Wed Jan 22 04:48:03 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.43 Driver Version: 440.43 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:65:00.0 Off | Off |
| N/A 52C P8 18W / 70W | 4118MiB / 16383MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
One example vGPU profile we can choose from the above list for the VM is : grid_t4-8q
. This profile allows the VM to use at most 8GB of the physical GPU’s memory (which is 16GB in total), grid_t4-4q
for 4GB.
Ensure that the developer tools such as gcc
are installed using the commands as follows:
apt update
apt upgrade
sudo apt install build-essential
Download the .run
file for the NVIDIA vGPU Linux guest VM driver from the NVIDIA site.
This is a special driver that comes with the NVIDIA vGPU software – it is not a stock NVIDIA driver that is found outside of that product.
Copy the NVIDIA vGPU Linux driver package (for example the NVIDIA-Linux-x86_64-440.43-grid.run
file) into the Linux VM’s file system.
./NVIDIA-Linux-x86_64-440.43-grid.run
after finish, check using nvidia-smi
ubuntu@ubuntu-tesla:~$ nvidia-smi
Wed Jan 22 12:05:05 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.43 Driver Version: 440.43 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID T4-4Q On | 00000000:02:04.0 Off | N/A |
| N/A N/A P8 N/A / N/A | 272MiB / 4064MiB | 0% Default |
+-------------------------------+----------------------+----------------------+