Skip to content

Instantly share code, notes, and snippets.

@yorickdowne
Last active December 17, 2024 11:26
Show Gist options
  • Save yorickdowne/ff6b611a2d49855827dafdbfd2546abe to your computer and use it in GitHub Desktop.
Save yorickdowne/ff6b611a2d49855827dafdbfd2546abe to your computer and use it in GitHub Desktop.
Stabilize an Asus or Intel NUC for 24/7 server operations

The issue

Asus (formerly Intel) NUCs weren't designed for use as a 24/7 server, though they see enthusiastic use as such. Some users report instability. And while some of that can be bad RAM or a faulty motherboard / CPU, some can also just be configuration issues. And some is plain heat.

It's a little tough to nail it down. Here are some good practices to follow, to have a stable NUC.

BIOS update

Particularly if it's an older NUC, upgrade the BIOS. These are now hosted by Asus and may be a little hard to find for older models.

  • Get your model number / SKU. For example, NUC10i7FNH
  • On the ASUS download center, look for the Model search box and enter the model number.
  • On the right side of the page the model number will show up, with options underneath. Click on "Drivers and Tools"
  • On the next page select "Bios and Firmware"

User DagoDuck says this about NUC11 firmware: "For the TNTGL357 BIOS (NUC 11), if you're below version 0071, you first have to update to 0071 before being able to apply the latest update (newest version atm is 0077).

To obtain the file for version 0071, you have to modify the URL of the download link, as the older version isn't referenced anywhere on the ASUS website."

The current gen 14 Pro BIOS for example is here: https://www.asus.com/displays-desktops/nucs/nuc-mini-pcs/asus-nuc-14-pro/helpdesk_bios?model2Name=ASUS-NUC-14-Pro-Kit

Power settings and heat

The default setting for these is "broil". That means the fans run a lot, which can lead to dust build up inside until airflow is completely blocked.

dusty-nuc

Cleaning this requires removing the board so the fan assembly underneath is accessible.

You can check heat by using smartctl. Install smartmontools:

sudo apt update && sudo install smartmontools

Then run sudo smartctl -x /dev/nvme0n1 and look for the temperature of the drive. You'd like to see it below 50C, certainly not above 60C.

Setting the power level / limit to something less aggressive is generally better for 24/7 running.

First, find the data sheet for your CPU, simply by googling the CPU model. Here's the one for an Intel Core 5 125H.

In the data sheet, look for one of:

  • Minimum Assured Power
  • TDP down
  • TDP low
  • TDP min

Boot the NUC and press F2 to get into BIOS.

Under "Power", you'll find "Package Power Limit 1", or possibly simply "Power Level 1". Set that to the min value you found, which is specific to the CPU. For this specific CPU, 20W.

Then, set "Package Power Limit 2" to the nearest number that's 1.25x to 1.3x that. In this example, that'd be 26W.

And that's it for controlling heat.

Kernel settings

Some models of NVMe drives can cause the system to lock up when the drive enables power savings. This can be cured by changing the kernel startup parameters.

sudo nano /etc/default/grub

Find the line GRUB_CMDLINE_LINUX_DEFAULT. Add to it, keeping what's already there: vme_core.default_ps_max_latency_us=0 pcie_aspm=off. Save the file with Ctrl-X.

sudo update-grub, then sudo reboot

This keeps the drive from entering powersave states by itself

Kernel version

There are some reports that the Ubuntu 22.04 kernel can cause the Ethernet driver to lock up.

There are also reports that this resolved after a BIOS update and updating the kernel by either using the hwe kernel package with sudo apt install --install-recommends linux-generic-hwe-22.04 or by upgrading to Ubuntu 24.04.

The BIOS update is key, we also have reports of the new kernel alone not resolving the issue.

Some users side-stepped the issue entirely by using an Ethernet USB dongle instead of the built-in Ethernet.

Memory errors

All consumer RAM can fail, and it'll do so silently. The symptoms can range from corrupted data to the NUC "freezing".

To rule out (intermittently) faulty RAM, donwload memtest86+, flash it to an USB stick, boot from that USB, and run a continuous loop memory test for 5 days (!) or until you see errors, whichever is earlier.

If no errors were seen after 5 days, the RAM is probably fine.

A single run, or even 5 runs, of the memory test, are inconclusive. RAM failures can be quite intermittent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment