Jan 2nd, 2022
- System Specification Check
- NVIDIA Driver Installation
- CUDA Toolkit Installation
- cuDNN Installation
- Nvidia Documentation
- Check your system architecture to select correct installers for your platform
$ uname -m $ dpkg --print-architecture
- Remove old installation
$ sudo apt-get purge nvidia-* $ sudo apt-get update $ sudo apt-get autoremove # DO NOT skip this line
- Search for latest version of Nvidia driver
$ apt search nvidia-driver
- Install Nvidia libraries
$ sudo apt install libnvidia-common-<version> $ sudo apt install libnividia-gl-<version>
- Install Nvidia driver
$ sudo apt install nvidia-driver-<version>
- Reboot and check for the installation
$ nvidia-smi
- Intsall kernel headers and developement packages for your currently running kernel
$ sudo apt-get install linux-headers-$(uname -r)
- Download and install CUDA Toolkit
- CUDA Toolkit from Nvidia Developer
- Select target platform
- Recommendation: pick deb [network] option of Installer Type
- Follow the installation instruction on the download page to install CUDA Toolkit
- To include GDS package with CUDA Toolkit
- GDS Overview: CUDA GPUDirect Storage
$ sudo apt-get install nvidia-gds
- Setup environment
- Config
$PATH
variable with following script:CUDA_HOME=/usr/local/cuda PATH=${CUDA_HOME}/bin${PATH:+:${PATH}} LD_LIBRARY_PATH=${CUDA_HOME}/lib64 ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export LD_LIBRARY_PATH export CUDA_HOME export PATH
- Add the script to either:
~/.bashrc
for user session usage/etc/profile
for system wide usage
- Setup POWER9
- Check NVIDIA Persistence Daemon
$ systemctl status nvidia-persistenced
- If it is not loaded
$ sudo systemctl enable nvidia-persistenced
- Reboot and check for installation
$ nvcc --version
- Download cuDNN:
- Nvidia cuDNN from Nvidia Developer (local installer)
- NVIDIA Developer Program Membership is required to download
- Select CUDA matching version and target platform
- Install cuDNN
- Import CUDA GPG key
$ sudo dpkg -i <downloaded-file> $ sudo apt-key add /var/cudnn-local-repo-*/7fa2af80.pub $ sudo apt-get update
- To auto-match version of cuDNN v8 with version of CUDA when installing:
$ sudo apt-get install libcudnn8 $ sudo apt-get install libcudnn8-dev $ sudo apt-get install libcudnn8-samples
- Operating System: Ubuntu 20.04 x84_64 (64-bit)
- Architecture: amd64
- GPU: Nvidia GeForce GTX 1050
- Installation with success on: Jan 2nd, 2022
Hey @minhhieutruong0705 ,
I have set up GDS, but while running experiments facing this error, by any chance did you face the same errors in dmesg?
Running test benchmarks:
Found a few articles where the GPU Direct RDMA is not supported on GeForce, but since you confirmed that GDS was working on RTX 3090 wanted to double-check.
https://www.reddit.com/r/nvidia/comments/irvk1n/does_rtx_30_series_offer_gpu_direct_storage/
Im also getting these errors in fstat, BAR1-map errors:
GPU - RTX 3090
CUDA 12.1
CUDA driver 530.xx