WARNING: This is only for headless Framework Desktop and other AI MAX 395+ 128GB machines. I tried this on my Asus ROG Z13 with KDE running and it crashed my system hard. If you're using LLMs on a machine with a desktop environment, consider running llama.cpp server with the Vulkan backend instead of this.
First you have to set up your Framework Desktop to allow a large amount of GTT memory.
This was tested with the following modprobe.conf settings:
# Maximize GTT for LLM usage on 128GB UMA system
options amdgpu gttsize=120000
options ttm pages_limit=31457280