Created
May 2, 2024 11:56
-
-
Save morgangiraud/4f58a62316fac7b4a32b81f0a66893fc to your computer and use it in GitHub Desktop.
tinygrad-p2p-patched-driver.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# this GIST is a follow-up to this previous GIST: https://gist.github.com/morgangiraud/ffa45e76b6891cd4e37e90d75b8be37b | |
# See the article here: https://morgangiraud.medium.com/multi-gpu-nvidia-p2p-capabilities-and-debugging-tips-fb7597b4e2b5 | |
# It provides some tips and tricks to install Tinygrad patched nvidia open kernel to give P2P capabilities | |
# to the 40** series! | |
### Transitioning into complex operations, our aim is to minimize potential issues. | |
### Important: Verify that the version from nvidia-smi matches exactly what we intend to use. | |
### At the time of this writing, the reported version is 550.78. | |
# To ensure consistency between the driver versions, it's crucial to start with a clean slate. | |
# We will remove the currently installed Nvidia driver and replace it with a custom patched version. | |
# It's imperative to match the version precisely with the one from the tinygrad repository to avoid dependency conflicts. | |
# Let's go! | |
# We remove the driver | |
sudo apt remove nvidia-driver-550-open | |
# Obtain the patched driver source code. | |
git clone [email protected]:tinygrad/open-gpu-kernel-modules.git | |
cd open-gpu-kernel-modules | |
# Note: The tinygrad team's modifications are based on version 550.54 of the Nvidia driver. | |
# It is crucial to integrate Nvidia modifications up to the version of the driver you previously installed. | |
# to do so | |
# Add the official Nvidia repository as an upstream source. This allows us to fetch updates directly from Nvidia. | |
git remote add upstream [email protected]:NVIDIA/open-gpu-kernel-modules.git | |
# Fetch all branches from the upstream repository to ensure we have the latest changes. | |
git remote update | |
# Before merging, review the commit history from the Nvidia repository to identify relevant updates. | |
# Pay special attention to tags and commit messages that correlate with driver versions. | |
git log upstream/main | |
# Merge changes from the Nvidia upstream. Specifically, merge up to the version that matches your previously installed driver. | |
# In this scenario, we merge up to the most recent update provided by Nvidia. | |
git merge upstream/main | |
# It's possible that merging changes may result in conflicts. | |
# If conflicts arise, carefully determine which files are affected. | |
# For non-critical files like the README, you can safely resolve conflicts by | |
# choosing to override the README with tinygrad version. | |
# Clean up previous build artifacts to prepare for a fresh build. | |
make clean | |
# Ensure none of the Nvidia kernel modules are currently loaded to avoid conflicts. | |
sudo rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia_fs nvidia | |
# If the 'nvidia_drm' module cannot be removed due to active use by the GUI, switch to a text-only target. | |
# This is common if you have a graphical interface like GNOME or KDE running on Ubuntu. | |
sudo systemctl isolate multi-user.target | |
# Compile the Nvidia modules with specific compiler settings for compatibility. | |
# The compiler you use should be the ones used to compile your linux kernel. | |
CC=gcc-12 CXX=g++ make modules -j$(nproc) | |
# Package the compiled modules into a Debian package for easy installation and management. | |
# Use 'checkinstall' to create a Debian package instead of installing the modules directly. | |
CC=gcc-12 CXX=g++ sudo checkinstall make modules_install -j$(nproc) | |
# When prompted by 'checkinstall', name the package meaningfully, | |
#e.g., 'nvidia-driver-550-open-patch-tinygrad', | |
# and set the version to match the current branch or your custom version, | |
#such as '550.78-p2p'. | |
# The installation process will add the new modules to the kernel directory: | |
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia.ko | |
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-uvm.ko | |
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-modeset.ko | |
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-drm.ko | |
# - /lib/modules/6.5.0-28-generic/kernel/drivers/video/nvidia-peermem.ko | |
# Update the module dependency files to ensure the system recognizes the new modules. | |
sudo depmod | |
# Verify the installation by checking the functionality of the Nvidia System Management Interface. | |
nvidia-smi | |
# Reboot the system to finalize the installation and start using the new driver modules. | |
sudo reboot | |
### | |
# Troubleshooting | |
### | |
# Ensure the driver file used is one of the above | |
modinfo -F filename nvidia | |
modinfo -F version nvidia | |
# If you see a discrepancy, you might need to remove any other driver that might be here | |
# Disable again nvidia modules | |
sudo rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia | |
# Remove any duplicate Nvidia module files if found. | |
# Do that for the files installed above (nvidia.ko, nvidia-uvm.ko, etc.) | |
find /lib/modules/$(uname -r) -type f -name "nvidia.ko" | |
rm ... | |
# update the initial RAM filesystem to ensure the system uses the correct driver version at boot | |
sudo update-initramfs -u | |
# Rebuild the module dependency map | |
sudo depmod | |
# Re-check the module information: | |
modinfo -F filename nvidia | |
# If it looks good, reboot one last time | |
sudo reboot | |
#and check that you can query the driver | |
nvidia-smi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment