Skip to content

Instantly share code, notes, and snippets.

@roadkell
Last active September 12, 2023 09:35
Show Gist options
  • Save roadkell/9e98db6656e28fbbf1bf51082040f67f to your computer and use it in GitHub Desktop.
Save roadkell/9e98db6656e28fbbf1bf51082040f67f to your computer and use it in GitHub Desktop.
Fixing acpi_call kernel oops on Thinkpads

Fixing acpi_call kernel oops on Thinkpads

Intro

TLP, a power management utility for Thinkpads and other laptops, uses tpacpi-bat script for battery calibration and setting charge thresholds (for Thinkpads xx20 and later), which in turn uses acpi_call Linux kernel module that enables calls to ACPI methods through /proc/acpi/call. acpi_call can also be used for hybrid graphics switching and other power management tasks.

What happened

As explained here and here, a kernel upstream commit made seek support for procfs mandatory. Not providing it will cause a null pointer exception for kernels >=5.13.0, including Ubuntu 21.10. Consequently, pre-1.2.2 versions of acpi_call became incompatible, and calling into them leads to a null pointer dereference.

How does it show

If you run Ubuntu 21.10 and have acpi-call-dkms installed, you can check if this bug affects you with this command:

sudo dmesg | grep "BUG: kernel NULL pointer dereference" -A 10

If it does, you'll see something like this among the lines:

[45420.141212] BUG: kernel NULL pointer dereference, address: 0000000000000000
[45420.141217] #PF: supervisor instruction fetch in kernel mode
[45420.141220] #PF: error_code(0x0010) - not-present page
[45420.141221] PGD 0 P4D 0 
[45420.141224] Oops: 0010 [#4] SMP NOPTI
[45420.141226] CPU: 3 PID: 85578 Comm: tpacpi-bat Tainted: G      D    O      5.13.0-19-generic #1
...

How to fix

Thankfully, acpi_call has already been fixed in v1.2.2. Unfortunately, many repositories still ship an outdated version, not mentioning the need for backports. So, if you happen to have this combo of a Thinkpad xx20 or later, a Linux kernel >=5.13, an acpi_call <1.2.2, and TLP or some other software utilizing it, you'll have to manually download, compile and install a fresh version of acpi_call.

But first, if there was a Non-Volatile Variable Storage is About Full boot error, you'll need to clean up NVRAM, as described here. In our case, this error is caused by kernel dumps filling up the storage. Make sure you have the same error, by comparing your dmesg output with the one above, check TLP battery FAQ, and UEFI troubleshooting on ArchWiki.

Commands below are for Debian/dpkg-based distributions, including Ubuntu and its derivatives. If needed, replace with appropriate commands for your distro.

⚠️ Warning! Deleting wrong EFI variables may brick your system. Read ArchWiki first. Proceed with caution.

# Check if there are any dumps
sudo ls /sys/firmware/efi/efivars/dump-*

# If found, delete them
sudo rm /sys/firmware/efi/efivars/dump-*

Now on to installing the kernel module.

# Remove previously installed acpi-call-dkms package (if any)
sudo apt purge acpi-call-dkms

# Install git (if you don’t have it installed yet)
sudo apt install git

# Clone the repository at nix-community/acpi_call
git clone --branch v1.2.2 https://github.com/nix-community/acpi_call.git

# Navigate to the cloned repository
cd acpi_call

# Prepare dkms.conf file
make dkms.conf

# Copy the module source to the shared sources directory
sudo cp -R . /usr/src/acpi-call-1.2.2

# Add the module to the dkms tree for build
sudo dkms add -m acpi-call -v 1.2.2

# Build the module
sudo dkms build -m acpi-call -v 1.2.2

# Install the module
sudo dkms install -m acpi-call -v 1.2.2

# Reboot
sudo reboot

Finally, take a moment and notify the maintainers of the package for your distro about the bug and the updated version. For example, here is the bug report for Debian acpi-call-dkms package, and here is for Ubuntu.

EDIT: the proper way of installing the module is taken from here, kudos to @monosoul.

@DiagonalArg
Copy link

DiagonalArg commented Feb 27, 2022

If I am to remove the /sys/firmware/efivars/dump-* files, am I to also remove the associated /sys/firmware/efi/vars/dump-* directories? Each of those directories look like (choosing just one):

$ ls -l efivars/dump*
-rw-r--r-- 1 root root 644 Feb 27 00:12 efivars/dump-type0-10-1-1645912016-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0

$ ls -l vars/dump*
vars/dump-type0-10-1-1645912016-C-cfc8fc79-be2e-4ddc-97f0-9f98bfe298a0:
total 0
-r-------- 1 root root 4096 Feb 27 00:48 attributes
-r-------- 1 root root 4096 Feb 27 00:48 data
-r-------- 1 root root 4096 Feb 27 00:48 guid
-rw------- 1 root root 4096 Feb 27 00:48 raw_var
-r-------- 1 root root 4096 Feb 27 00:48 size

Edit:

I am seeing in the Arch UEFI documentation:

UEFI Runtime Variables Support (efivarfs filesystem - /sys/firmware/efi/efivars). This option is important as this is required to manipulate 
UEFI runtime variables using tools like /usr/bin/efibootmgr. The configuration option below has been added in kernel 3.10 and later.

`CONFIG_EFIVAR_FS=y`

UEFI Runtime Variables Support (old efivars sysfs interface - /sys/firmware/efi/vars). This option should be disabled to prevent any potential issues with both efivarfs and sysfs-efivars enabled.

`CONFIG_EFI_VARS=n`

Unfortunately, in Ubuntu 20.04, both of these are set =y

So what are the implications? Should we be deleting dump-* in both? (I have also posted this as a Superuser question.)

@LinuxOnTheDesktop
Copy link

LinuxOnTheDesktop commented Mar 8, 2022

Thank you, Roadkell. I note the following.

  1. On my affected ThinkPad X230, after applying the fix, I still had the problem. But after deleting the dump files - again - and rebooting, all seems good.

  2. The post by @DiagonalArg asks a question and says the question was posted on Superuser. On superuser, the response to the question was, in summary: you don't need to worry about those directories you were worrying about.

@frzb
Copy link

frzb commented Mar 23, 2022

Thank you so much.

Нет войне!

@DiagonalArg
Copy link

Thank you, Roadkell. I note the following.

  1. On my affected ThinkPad X230, after applying the fix, I still had the problem. But after deleting the dump files - again - and rebooting, all seems good.

I didn't apply the fix. I just removed tlp, but I nevertheless had the same problem. I had to delete the files, reboot, delete again, and reboot a second time, before the files (and directories) were finally gone. I think this has kill a couple of laptops, when tlp was kept (not patched), the files were thought to be deleted (but were not), and a reboot was done, fillling the NVRAM.

@Silcet
Copy link

Silcet commented Apr 20, 2022

This worked like a charm in my Thinkpad P52 with Ubuntu 20.04 and kernel 5.13. Thanks to you I can recalibrate my poor battery that was at 60% capacity.

Thank you so much! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment