How to blacklist particular defective RAM areas on a machine with Linux?
Well, the first and main reason which you probably guessed is because recently my laptop turned out to have a defective RAM and I had to do something with it as it became unstable to work on. The second reason is because I didn't even know that there is another way than just replacing the RAM (for the record if you can just simply replace the RAM stick then I don't recommend using a PC with a faulty RAM; however in my case the laptop has a soldered-in RAM which makes replacing the RAM impossible for me and pretty expensive in a repair shop). The third reason is the fact that even tho I learned from a friend what I should use to "fix" my issue I couldn't really find any reliable info how to do it now because in 2023 no one does it anymore (it was vastly more popular 20 years ago when RAM was expensive).
Even tho I try to gather as much correct information as possible it is likely that some of it might not be 100% correct. Not everything that I describe here was done by me and some of is based only on the info I could find in the Internet. Additionally I linked at the end the pages which I based on when trying to achieve that on my machine - there might be more info which I forgot to describe here. If you spot that something is missing or is not 100% true, correct me in the comments. If you know any other/better methods feel free to put them in the comments as well.
If you read that manual I can assume you probably use some kind of Linux and probably know that your RAM is faulty. However I will start with the most basic thing here - testing for faulty RAM. The faulty RAM can result in your system being unstable. In my case it was visible in Chrome, because every once in a while some tabs (generally it was just one tab) would crash with the error code related to memory issue. However faulty RAM can cause other issues such as random app crashes, system crashes and many more. If you encounter any weird behavior of your PC then it's a good idea to test your RAM. To test your RAM (especially if you use Linux) I suggest using Memtest86+ which is probably already installed in your Linux or if not, you can do it easily. When it is installed there should be a dedicated GRUB entry to boot into Memtest86+. After booting up the RAM test automatically starts and it takes about 2h to complete (it depends on your RAM size and speeds, but for modern 16GB it's around 2h). The default view in Memtest86+ shows every RAM error, so if you see any red line, then your RAM is most probably broken. The test is done when all individual tests pass (you can see the progress for the individual test as well as the whole pass).
Okay, so I guess I could kinda scare you off with my long introduction, but in reality blacklisting RAM area is very easy (if you know how, that is. I lost too much time with that stuff during my first attempt). So I assume you used Memtest86+ and it showed you some errors with the specific RAM address that is failing. The solution I will use here is BadRAM which was a standalone Linux kernel module, but now is a part of the Linux kernel itself for some time. The solution presented here also assumes that you use GRUB (if not, you need to find the way on your own).
- Run Memtest86+ and at the beginning of the first pass switch an error reporting mode to BadRAM pattern
Note: To switch the error reporting mode press c (configuration) -> 3 (Error Reporting Mode) -> (3) BadRAM Patterns. Make sure the numbers are the same in your version of Memtest86+. This will change the output of the Memtest86+ to already prepared BadRAM pattern which will be needed later. - When one whole pass is finished write down (or use Google Lens to scan) the pattern
- If you can, boot into your Linux on this machine, then go to step 8
- If you can't prepare the Linux installation media on some USB drive using a different PC
- Locate the
/boot/grub/grub.cfg
file on the USB drive (!) - Append the
badram #PATTERN#
at the end of the file, replacing#PATTERN#
with the one you got from Memtest86+, eg.badram 0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc
- Boot your faulty PC with the USB drive
chroot
into your main installation- Append the
GRUB_BADRAM="#PATTERN#"
at the end of the/etc/default/grub
file, replacing#PATTERN#
with the one you got from Memtest86+, eg.GRUB_BADRAM="0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc"
- Recreate GRUB for your main installation (eg. by running
update-grub
orgrub-mkconfig -o /boot/grub/grub.cfg
still being in the chroot env) - Exit the chroot env and reboot the PC
- Run the Memtest86+ again (from your GRUB) and this time there should be a clear pass
Note: If you don't have Memtest86+ in your GRUB and you used a custom boatable USB drive with Memtest86+ then it will still show the errors, only Memtest86+ run from the GRUB with BadRAM set up should have a clear pass.
Memtest86+ can return an already prepared BadRAM pattern. It looks something like this: 0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc
. This pattern have to be inserted into the /boot/grub/grub.cfg
(grub config file) as badram 0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc
. However this file should never be manually edited as it is overwritten every time GRUB is updated. All the user specified config goes into the file /etc/default/grub
which is used by GRUB generating tools. In that file the correct syntax for BadRAM is GRUB_BADRAM="0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc"
. So the correct way to apply the fix is to put it at the end of /etc/default/grub
as GRUB_BADRAM="0xa84c8b0c,0xfffffffc,0xa04c8ca8,0xfffffffc,0xa84c8c88,0xfffffffc"
and recreate GRUB.
- PassMark MemTest86 - Memory Diagnostic Tool - Troublingshooting Memory Errors
- GNU GRUB Manual 2.12: badram
- badram kernel parameter not working? - Unix & Linux Stack Exchange
- User friendly way to apply BadRAM patterns - Unix & Linux Stack Exchange
- How to blacklist a correct bad RAM sector according to MemTest86+ error indication? - Unix & Linux Stack Exchange
- BadRAM: Linux kernel support for broken RAM modules
- How to boot with memmap kernel option in Linux UEFI? - Stack Overflow
- Using the memmap Kernel Option - Persistent Memory Documentation
There is a second method for blacklisting specific RAM areas - memmap
kernel parameter. If I understand correctly it's an older solution.
At first I wanted to use this method as I didn't understand BadRAM and I thought I needed to manually apply the BadRAM kernel module into my kernel (I've never compiled the Linux kernel and I didn't want to learn it). However the issue I encountered with mememap
was that it doesn't work on UEFI installation. I might be wrong here, but this is what I found (I listed some post in the sources).
This kernel param uses a different pattern than BadRAM. One of the sources listed above explains how to convert a BadRAM pattern into the memmap
one, so if you're interested go check it out.
In short it should work as follows. You take the memmap
pattern, you put it as a kernel param in the /etc/default/grub
file, recompile the GRUB and it should work.
I didn't try that one, but the first source I listed explains a potential solution on how to do it in Windows so it might be useful for some of you.
In summary this is a pretty easy solution, that can be applied in a few hours max (counting the Memtest86+ run). However it took me more than a week to find out how to do it (at some point I even tried to reinstall Linux with nonUEFI installation which happened to be the issue in itself). I know it mostly comes down to the fact that I couldn't connect the info I got, but because of that I wanted to create a short summary/tutorial which would explain everything in an accessible way.
Windows solution
Recently I had an experience of running Windows 10 on this device with a faulty RAM so I looked up how to set it up in Windows. It's kinda similar but you can't use it during Windows install so you may end up with corrupted Windows installation (no way around it really, other than installing Windows on a different PC and moving the drive).
Step by step
Setting it up is easy if you have installed Windows already.
Windows doesn't work with addresses and masks like Linux badram, here you have the ability to blacklist particular pages of memory (on most devices there are chunks of 4KB of RAM). To calculate the values required just take the address given by
memtest86
and remove 3 last digits (4KB is 0x1000) -> 0x1a84c8b0c becomes 0x1a84c8. If you have more than one faulty address in one page it is enough to specify it once.I just want to note that I got a little bit different faulty RAM address under Windows than I got under Manjaro Linux. For Windows I run
memtest86+
via Rescuzilla CD, while under Manjaro I was runningmemtest86+
from my GRUB. Under Linux the faulty value was 0xa84c8b0c while under Windows I got 0x1a84c8b0c while 0xa84c8b0c address was not present at all. Don't really know the reason for that.Then run the
cmd
as administrator and enter the following commands.Then reboot the device and run RAMMap tool to check if the RAM is properly blacklisted. You should be able to see it in the "Physical ranges" tab.
If you want to remove the blacklisted pages you may use the following command.
Issue with Windows 10 2004 and 20H2
Some users couldn't use
badmemorylist
on Windows 10 version 2004 and 20H2. I tried using it with Win10 version 22H2 (the last version of Win10) and it worked, so it seems that at least on the 22H2 the bug is no longer present.Updating Windows
As Windows 10 22H2 is the latest version of Windows this issue is no longer that big of a deal, but when updating between Windows releases
badmemorylist
are lost and have to be set again. Moreover they are not used during the update process so you may end up with a corrupted Windows installation (the same issue as with installing). The same issue applies to Windows 11 if you want to use it on Windows 11 (my device doesn't support Win11 so I don't think I will be upgrading).Sources