Skip to content

Instantly share code, notes, and snippets.

@jodoherty
jodoherty / README.md
Last active June 8, 2026 13:08
SpacialUncertainty concern

This is a little example of a concern that makes it easy to model things with point locations and an error radius or error ellipses with cached polygons that can be indexed and queried.

It was created by Gemma 31B.

It requires activerecord-postgis-adapter/rgeo.

License:

This is free and unencumbered software released into the public domain.

@jodoherty
jodoherty / rtxpro6000_benchmark.md
Last active May 16, 2026 03:26
RTX Pro 6000 Blackwell benchmarks (vLLM)

RTX Pro 6000 Blackwell vLLM Benchmarks

Hardware:

01:00.0 VGA compatible controller: NVIDIA Corporation GB202GL [RTX PRO 6000 Blackwell Workstation Edition] (rev a1)

Summary

@jodoherty
jodoherty / v100_benchmarks.md
Last active May 16, 2026 02:45
8xV100 Benchmarks

Benchmarked with llama-benchy:

https://github.com/eugr/llama-benchy

May 14th, 2026

This was done on a Lambda.ai rental, so I didn't want to spend a lot of time testing different prompt sizes and context depths. I just did a basic set of benchmarks with some different quantizations of Gemma 4 26B A4B and Gemma 4 31B.

The machine had 8 V100 GPUs with 16GB of VRAM each:

@jodoherty
jodoherty / README.md
Last active May 9, 2026 16:05
llama.cpp server AMD Radeon RX 7900 XTX perfect fit

This llama-server setup is specifically tuned to my AMD Radeon RX 7900 XTX for running gemma 4 26B A4B quantized by unsloth.

I've set it up to ensure it's stable, preferring as much practical quality as possible despite the VRAM limits.

This utilizes 99% of the VRAM on my setup so there's no room for improvement.

I get somewhere between 100-120 tokens/second token generation speeds with a single user.

#!/bin/sh
type=cuda
type=rocm
model=google/gemma-4-26B-A4B-it
u=0.9
if [ "$#" -eq 0 ]; then
set -- -d --restart=unless-stopped
fi
#!/bin/sh
type=cuda
type=rocm
type=vulkan
image=ghcr.io/ggml-org/llama.cpp:server-$type
model=gemma-4-26B-A4B-it
q=UD-Q4_K_XL
q=MXFP4_MOE
q=UD-Q8_K_XL
@jodoherty
jodoherty / README.md
Last active April 26, 2026 23:28
vllm framework desktop setup

WARNING: This is only for headless Framework Desktop and other AI MAX 395+ 128GB machines. I tried this on my Asus ROG Z13 with KDE running and it crashed my system hard. If you're using LLMs on a machine with a desktop environment, consider running llama.cpp server with the Vulkan backend instead of this.

First you have to set up your Framework Desktop to allow a large amount of GTT memory.

This was tested with the following modprobe.conf settings:

# Maximize GTT for LLM usage on 128GB UMA system
options amdgpu gttsize=120000
options ttm pages_limit=31457280
@jodoherty
jodoherty / localclaude.md
Last active April 23, 2026 01:46
Local Claude Code setup with llama-server and gemma4 using a framework desktop

Enable larger GTT to fit models into memory.

options ttm pages_limit=31457280
options ttm page_pool_size=15728640

Download and stage gemma4 variants into a local directory for llama-server.

mkdir -p /srv/models/{gemma-4-26B-A4B-it-GGUF,gemma-4-E2B-it-GGUF,gemma-4-31B-it-GGUF}
@jodoherty
jodoherty / main.py
Created March 26, 2025 02:02
Prefect extra loggers with threading example.
"""
Prefect extra loggers with threading example.
Run it like this:
PREFECT_LOGGING_EXTRA_LOGGERS=__main__ PREFECT_API_URL=http://127.0.0.1:4200/api python main.py
You should see the plain Python logging for the '__main__' package in the
Prefect UI.
@jodoherty
jodoherty / .Xresources
Last active February 10, 2025 15:39
uxterm customization
UXTerm.termName: xterm-256color
!UXTerm*font: -misc-fixed-medium-r-semicondensed-*-13-*-*-*-*-*-iso10646-1
UXTerm*font: -gnu-unifont-medium-r-normal-*-16-*-*-*-*-*-iso10646-1
!UXTerm*reverseVideo: true
UXTerm*loginShell: true
UXTerm*visualBell: true
UXTerm*visualBellLine: true
UXTerm*altSendsEscape: true