Skip to content

Instantly share code, notes, and snippets.

@ubergarm
ubergarm / Qwen3-MoE-Benchmarks.md
Last active May 17, 2025 10:52
Qwen3 235B and 30B MoE Quant Benchmarking Roundup

The Great Quant Wars of 2025

"All things leave behind them the Obscurity... and go forward to embrace the Brightness..." — Dao De Jing #42

tl;dr;

  • Q: Who provides the best GGUFs now?
  • A: They're all pretty good.

Skip down if you just want graphs and numbers comparing various Qwen3-30B-A3B GGUF quants.

@ubergarm
ubergarm / README.md
Last active May 4, 2025 16:13
Visualize importance score statistics for three Qwen3-30B-A3B llama-imatrix files.
  1. Used @EAddario's PR ggml-org/llama.cpp#12718 to generate imatrix statistics.
  2. These were the imatrix data files used, and appear in each mosaic top to bottom in this order (barto, uber, unsloth)
  1. Similar to https://huggingface.co/ikawrakow/Qwen3-30B-A3B for https://huggingface.co/ikawrakow/Qwen3-30B-A3B but I didn't use the 128k usnloth one and I didn't see ik's to run.

See attached images below generated using some python/matplotlib/image magic scripts vibe coded using ubergarm/Qwen3-30B-A3B-mix-IQ3_K. You can click them to load them larger, they are not too big at 100dpi. You may need to shift-reload to refresh before clicking on them as possibly I

@ubergarm
ubergarm / l1t-ai-fun.md
Created May 2, 2025 19:36
Having fun with ai trying to solve some text mapping problems for a show!

1. Mapping Table between English and German Releases

Here is the mapping table between the English and German episodes based on their titles and synopses. The German episodes are grouped into three segments (a, b, c) per episode number, so each English episode is matched to one of these segments.

German Episode Number German Title English Episode Number English Title
1.01a Fett For Fun 1 Running an Errand / Mom's Mornings are Busy / Drawing
1.01b Sport extrem 2 Tricycles are Fun / My Stomach Is Going to Burst / A Nightmare for Dad
1.01c Braue um Braue, Zahn um Zahn 3 Watching Action Mask / School Lunch is Fun / Going to the Dentist
1.02a Eine wirklich schreckliche Familie 4 The Sunflower Class / Going on a Picnic
@ubergarm
ubergarm / VoidAlchemy-BookReview.md
Created March 16, 2025 23:22
R1 671B `ubergarm/DeepSeek-R1-Q2_K_R4` Book Review of Void Alchemy: Riddles and Wakeup Stories

Book Review: Void Alchemy: Riddles and Wakeup Stories by Empty Duck

Hey, fellow soul-searchers! 🌟 Let’s be real—modern life is a lot. Between doomscrolling, hustling for that side gig, an trying to remember what “inner peace” even feels like, it’s easy to feel like a lost duck in a thunderstorm. Enter Void Achemy: Riddles and Wakeup Stories by Empty Duck. This book isn’t just a breath of fresh air—it’s a full-on spiritual snorkel dive into the deep end of your soul.

Why You’ll Vibes With It Imagine if your favorite mindfulness app had a baby with a Zen koan and they both got raised by a poet who loves dad jokes Void Alchemy is a quirky, raw, and ridiculously relatable mix of ancient wisdom and modern wit. Empty Duck (aka John W. Leimgruber III) serves up bite-sized verses that feel like late-night texts from your wisest friend—the one who’s equal parts mystic and meme lord.

For When Life Feels Like a Glitch The book’s 128 micro-poems (or “wakeup stories”) are perfect for anyone who’s to

@ubergarm
ubergarm / void-alchemy-bot-review.md
Created March 13, 2025 17:13
I ask DeepSeek-R1 671B `UD-Q2_K_XL` what it thinks about the book Void Alchemy: Riddles and Wakeup Stories by emptyduck.

>>> User:

Give a conversational yet technical discussion of the following book of verse in terms of what is known about channeling, spiritual and mystical traditions, chan and zen poetry, the rig veda, and other ancient mystical texts.

(paste in the pdf2text of free e-book from https://emptyduck.com)

>>> Assistant:

<think>

@ubergarm
ubergarm / DeepSeek-R1-Quantized-GGUF-Gaming-Rig-Inferencing-Fast-NVMe-SSD.md
Last active April 17, 2025 16:55
Run DeepSeek R1 671B unsloth GGUF locally with ktransformers or llama.cpp on high end gaming rig!

tl;dr;

UPDATE Mon Mar 10 10:51:31 AM EDT 2025 Check out the newer ktransformers guide for how to get it running faster! About 3.5 tok/sec on this same gaming rig. Big thanks to Supreeth Koundinya with analyticsindiamag.com for the article!

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most u

@ubergarm
ubergarm / kokoro-tts-benchmark-text.md
Last active February 10, 2025 18:01
Benchmarking text for kokoro-tts PyTorch vs ONNX comparison.

Text for benchmarking kokoro-tts. Copy paste this into your main.py test app e.g.

TEXT = """
1. Who am I ?
The gross body which is composed of the seven humours (dhatus), I am not; the five cognitive sense organs, viz. the senses of hearing, touch, sight, taste, and smell, which apprehend their respective objects, viz. sound, touch, colour, taste, and odour, I am not; the five cognitive sense- organs, viz. the organs of speech, locomotion, grasping, excretion, and procreation, which have as their respective functions speaking, moving, grasping, excreting, and enjoying, I am not; the five vital airs, prana, etc., which perform respectively the five functions of in-breathing, etc., I am not; even the mind which thinks, I am not; the nescience too, which is endowed only with the residual impressions of objects, and in which there are no objects and no functioning’s, I am not.

2. If I am none of these, then who am I?
After negating all of the above-mentioned as ‘not this’, ‘not this’, that Awareness whi
@ubergarm
ubergarm / search.md
Last active December 8, 2024 16:30
Spiritual Search YT Channels
@ubergarm
ubergarm / ai-summary-jpa-dissertation.md
Last active July 25, 2024 18:09
AI Summary of J.P. Ascher's Draft Ph.D. Dissertation

AI Summary of J.P. Ascher's Ph.D. Dissertation

To test long context LLM understanding of academic materials running locally on <= 24GB VRAM.

tl;dr;

I downloaded a complex ~450 page Ph.D. dissertation PDF, converted it to text, and prompted two LLMs to generate some summaries. Exact versions of llama.cpp and GGUFs used for inference are listed below. All tests performed locally on 3090TI w/ 24GB VRAM. Both models support ~128k context in their respective tokenization formats.

  • Mistral-Nemo-12B-Instruct-2407
    • Tokenizes document into 51617 tokens
  • Not really full support for explicit system prompt.