New tested approach: Build Nvidia driver for rt kernel with installed driver on non rt OS. (tested on July 19, 2024 on zorin17 (equivalent to Ubuntu23.10))
Important
One thing before start the below process is to make sure your non-rt kernel and installed rt-kernel has same or close major version, in my most recent test case, it is 6.5.0 for generic vs 6.6.40 for rt.
Step. 0: First, install the rt kernel but dont boot into the rt version OS right now. Install the Nvidia driver first in your non rt OS. When you installed your OS this has probably been done, but check if you got the nvidia-kernel-source, if not,
sudo apt install nvidia-kernel-source-XXX
Step. 1: Now cd into the nvidia-kernel-source directory:
cd "$(dpkg -L nvidia-kernel-source-XXX | grep -m 1 "nvidia-drm" | xargs dirname)"
Step. 2: Build Nvidia driver with IGNORE_PREEMPT_RT_PRESENCE=1
. Note that, the command below use $(uname -r)
when you have already booted into the realtime OS, however, if you follow me steps right now I assume you are still in the non rt OS, so you should replace the $(uname -r)
with your installed rt kernel name.
# Build Nvidia driver with IGNORE_PREEMPT_RT_PRESENCE=1
sudo env NV_VERBOSE=1 \
make -j$(nproc) NV_EXCLUDE_BUILD_MODULES='' \
KERNEL_UNAME=$(uname -r) \
IGNORE_XEN_PRESENCE=1 \
IGNORE_CC_MISMATCH=1 \
IGNORE_PREEMPT_RT_PRESENCE=1 \
SYSSRC=/lib/modules/$(uname -r)/build \
LD=/usr/bin/ld.bfd \
modules
sudo mv *.ko /lib/modules/$(uname -r)/updates/dkms/
The last line move the .ko files from the kernel source folder to the rt kernel directory.
Step. 3: Now reboot with your realtime kernel. Step. 4: After you get into the rt system,
sudo depmod -a
sudo update-initramfs -k $(uname -r) -u
Step. 5: Then reboot again and check using nvidia-smi if you have nvidia driver successfully loaded.
Note
If you need to rebuild the NV driver in the RT system, just repeat step 1-5, skip step 3.
To install Nvidia driver on a PC running real-time kernel, follow the tutorial here: https://github.com/ApolloAuto/apollo/blob/master/docs/howto/how_to_install_apollo_kernel.md
https://github.com/cacao-org/cacao/wiki/OS-install-RTC
The rt kernel I tested at this moment is 5.15.21-rt30 (Feb 2022).
Follow a community contributed tutorial to install the latest stable RT_PREEMPT version https://docs.ros.org/en/foxy/Tutorials/Building-Realtime-rt_preempt-kernel-for-ROS-2.html
Open Software & Updates. in the Ubuntu Software menu tick the ‘Source code’ box.
Install dependencies:
sudo apt-get update
sudo apt-get build-dep linux
sudo apt-get install build-essential bc curl ca-certificates gnupg2 lsb-release libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf fakeroot
sudo apt update
sudo apt install zstd
cd ~
mkdir rt_kernel
cd rt_kernel
wget https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.21.tar.gz
wget https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.21.tar.sign
wget https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/older/patch-5.15.21-rt30.patch.gz
wget https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/older/patch-5.15.21-rt30.patch.sign
gunzip linux-5.15.21.tar.gz
# tar -xzf linux-5.15.21.tar.gz
gunzip patch-5.15.21-rt30.patch.gz
Verify the kernel files integrity
gpg2 --verify linux-*.tar.sign
gpg2 --verify patch-*.patch.sign
tar xf linux-*.tar
cd linux-*/
patch -p1 < ../patch-*.patch
cp /boot/config-$(uname -r) .config
yes '' | make oldconfig
make menuconfig
apply these modifications:
- general setup / preemption model
- Fully preemptible Kernel (Real-time)
- general setup / Timers subsystem
- High Resolution Timer Support
- general setup / Timers subsystem / Timer tick handling
- Full dynticks system (tickless)
- Processor type and features / Timer frequency
- 1000 Hz
- Power management and ACPI options
- CPU Frequency scaling, CPU Frequency scaling (CPU_FREQ [=y]) -> Default CPUFreq governor ( [=y]) (X) performance
- Cryptographic API* > Certificates for signature checking (at the very bottom of the list) > Provide system-wide ring of trusted keys > Additional X.509 keys for default system keyring
- Remove the “debian/canonical-certs.pem” from the prompt and press Ok.
save and exit.
make -j$(nproc) deb-pkg
or
make -j $(nproc)
sudo make bzImage
sudo make INSTALL_MOD_STRIP=1 modules_install -j $(nproc)
sudo make install
According to the post by cacao:
the
INSTALL_MOD_STRIP
is important - it shrinks the initial ramdisk size by ~90%. In some cases, when not used, the initial ramdisk would not load and the patched kernel would not boot.
If you see these errors:
- if you see this error:
CONFIG_X86_X32 enabled but no binutils support
, changeconfig_x86_x32
ton
sed: can't read modules.order: No such file or directory
- set
CONFIG_SYSTEM_TRUSTED_KEY=""
- set
CONFIG_SYSTEM_REVOCATION_KEYS=""
- set
Missing file: arch/x86/boot/bzImage
, you need to runsudo make bzImage
beforemodules_install
- if you see this error:
bin/sh: 1: zstd: not found
, you need to install Zstandard bysudo apt install zstd
Edit /etc/default/grub file Change GRUB_DEFAULT=0 to GRUB_DEFAULT=saved. Add GRUB_SAVEDEFAULT=true
According to https://frankaemika.github.io/docs/installation_linux.html#setting-up-the-real-time-kernel:
sudo addgroup realtime
sudo usermod -a -G realtime $(whoami)
Afterwards, add the following limits to the realtime group in /etc/security/limits.conf
:
@realtime soft rtprio 99
@realtime soft priority 99
@realtime soft memlock 102400
@realtime hard rtprio 99
@realtime hard priority 99
@realtime hard memlock 102400
Run the following to install CUDA 11.3
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda-repo-ubuntu2004-11-3-local_11.3.0-465.19.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-3-local_11.3.0-465.19.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-3-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
The corresponding driver version is 465 465 failed
sudo apt-get install nvidia-driver-470
Run the following to build Nvidia driver 470
sudo apt-get install nvidia-kernel-source-470
# Change to Nvidia driver source directory /usr/src/nvidia-465.19.01
cd "$(dpkg -L nvidia-kernel-source-550 | grep -m 1 "nvidia-drm" | xargs dirname)"
# Build Nvidia driver with IGNORE_PREEMPT_RT_PRESENCE=1
sudo env NV_VERBOSE=1 \
make -j$(nproc) NV_EXCLUDE_BUILD_MODULES='' \
KERNEL_UNAME=$(uname -r) \
IGNORE_XEN_PRESENCE=1 \
IGNORE_CC_MISMATCH=1 \
IGNORE_PREEMPT_RT_PRESENCE=1 \
SYSSRC=/lib/modules/$(uname -r)/build \
LD=/usr/bin/ld.bfd \
modules
sudo mv *.ko /lib/modules/$(uname -r)/updates/dkms/
sudo depmod -a
sudo update-initramfs -k $(uname -r) -u
reboot the system
sudo reboot
run nvidia-smi
to check the driver
A guide how to install nvidia driver manually: https://www.if-not-true-then-false.com/2021/debian-ubuntu-linux-mint-nvidia-guide/
Install dependencies
sudo apt-get update -y
sudo apt-get install -y libglvnd-dev
# 1. Download NVIDIA driver as a .run file
# 2. Stop X-Server
sudo service lightdm stop
# 3. Blacklist Nouveau driver
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
# Insert into file:
# blacklist nouveau
# options nouveau modeset=0
# 4. Update kernel initramfs
sudo update-initramfs -u
sudo reboot # I'm not sure if needed
# 5. Install driver!
chmod +x NVIDIA-Linux-*.run
sudo IGNORE_PREEMPT_RT_PRESENCE=1 bash NVIDIA-Linux-*.run # Insert downloaded .run file
# 6. Reboot
sudo reboot
https://gist.github.com/pantor/9786c41c03a97bca7a52aa0a72fa9387
if the nvidia-installer create modprobe files to disable Nouveau driver, and if you want to re-enable the Nouveau driver later, you will need to delete these files:
/usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf