Skip to content

Instantly share code, notes, and snippets.

@jodoherty
jodoherty / README.md
Last active May 9, 2026 16:05
llama.cpp server AMD Radeon RX 7900 XTX perfect fit

This llama-server setup is specifically tuned to my AMD Radeon RX 7900 XTX for running gemma 4 26B A4B quantized by unsloth.

I've set it up to ensure it's stable, preferring as much practical quality as possible despite the VRAM limits.

This utilizes 99% of the VRAM on my setup so there's no room for improvement.

I get somewhere between 100-120 tokens/second token generation speeds with a single user.

#!/bin/sh
type=cuda
type=rocm
model=google/gemma-4-26B-A4B-it
u=0.9
if [ "$#" -eq 0 ]; then
set -- -d --restart=unless-stopped
fi
#!/bin/sh
type=cuda
type=rocm
type=vulkan
image=ghcr.io/ggml-org/llama.cpp:server-$type
model=gemma-4-26B-A4B-it
q=UD-Q4_K_XL
q=MXFP4_MOE
q=UD-Q8_K_XL
@jodoherty
jodoherty / README.md
Last active April 26, 2026 23:28
vllm framework desktop setup

WARNING: This is only for headless Framework Desktop and other AI MAX 395+ 128GB machines. I tried this on my Asus ROG Z13 with KDE running and it crashed my system hard. If you're using LLMs on a machine with a desktop environment, consider running llama.cpp server with the Vulkan backend instead of this.

First you have to set up your Framework Desktop to allow a large amount of GTT memory.

This was tested with the following modprobe.conf settings:

# Maximize GTT for LLM usage on 128GB UMA system
options amdgpu gttsize=120000
options ttm pages_limit=31457280
@jodoherty
jodoherty / localclaude.md
Last active April 23, 2026 01:46
Local Claude Code setup with llama-server and gemma4 using a framework desktop

Enable larger GTT to fit models into memory.

options ttm pages_limit=31457280
options ttm page_pool_size=15728640

Download and stage gemma4 variants into a local directory for llama-server.

mkdir -p /srv/models/{gemma-4-26B-A4B-it-GGUF,gemma-4-E2B-it-GGUF,gemma-4-31B-it-GGUF}
@jodoherty
jodoherty / main.py
Created March 26, 2025 02:02
Prefect extra loggers with threading example.
"""
Prefect extra loggers with threading example.
Run it like this:
PREFECT_LOGGING_EXTRA_LOGGERS=__main__ PREFECT_API_URL=http://127.0.0.1:4200/api python main.py
You should see the plain Python logging for the '__main__' package in the
Prefect UI.
@jodoherty
jodoherty / .Xresources
Last active February 10, 2025 15:39
uxterm customization
UXTerm.termName: xterm-256color
!UXTerm*font: -misc-fixed-medium-r-semicondensed-*-13-*-*-*-*-*-iso10646-1
UXTerm*font: -gnu-unifont-medium-r-normal-*-16-*-*-*-*-*-iso10646-1
!UXTerm*reverseVideo: true
UXTerm*loginShell: true
UXTerm*visualBell: true
UXTerm*visualBellLine: true
UXTerm*altSendsEscape: true
#!/usr/sbin/nft -f
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0;
# accept any localhost traffic
iif lo accept
@jodoherty
jodoherty / 99-thinkpad.hwdb
Created February 3, 2024 23:06
Rebind PrtSc and CapsLock on Thinkpad laptops
evdev:atkbd:dmi:bvn*:bvr*:bd*:svnLENOVO*:pn*:pvrThinkPad*
KEYBOARD_KEY_b7=rightmeta
KEYBOARD_KEY_3a=leftctrl
@jodoherty
jodoherty / pc-keys-for-terminals.json
Last active November 11, 2023 01:29
Karabiner Elements rule to use a PC style modifier key layout for terminals and VMs
{
"title": "PC Keys for terminals/VMs",
"rules": [
{
"description": "Swap around modifiers",
"manipulators": [
{
"type": "basic",
"from": {
"key_code": "left_command",