Skip to content

Instantly share code, notes, and snippets.

View chadbrewbaker's full-sized avatar

Chad Brewbaker chadbrewbaker

View GitHub Profile
@chadbrewbaker
chadbrewbaker / oracle.md
Created June 8, 2026 15:51
Computational Depth Oracle Questions (ChatGPT prompted w Fortnow depth publication and P = NC literature

Oracle 1: Perfect Retrieval Suppose an oracle answers every factual question instantly. The model never forgets anything. Question: Does theorem proving become easy? If not, then factual knowledge wasn't the issue. This separates memory from reasoning. Oracle 2: Perfect Search Suppose an oracle instantly returns the best next action among all possibilities. Question:

@chadbrewbaker
chadbrewbaker / retardmaxx_muon.py
Created June 6, 2026 03:10
Retardmaxxing the Muon optimizer
import torch
import torch.nn as nn
from tqdm import tqdm
# --- CONFIG ---
DIM = 2048
BLOCK_SIZE = 16
LR = 0.01
ITERATIONS = 100
@chadbrewbaker
chadbrewbaker / run_gemma.sh
Created June 3, 2026 19:18
Gemma 12B llama.cpp settings
./llama-server \
-m gemma-4-12B-it-Q4_K_M.gguf \ # Target model
-md gemma-4-12B-it-assistant-Q5_K_M.gguf \ # MTP drafter (small ~0.4B)
--spec-type draft-mtp \ # Enable MTP speculative decoding
--spec-draft-n-max 4 \ # Typically 3-5 for Gemma 4 MTP (experiment)
--spec-draft-n-min 1 \
-c 131072 \ # Context (try 262144 if you have memory)
--cache-type-k q4_0 \ # 4-bit KV cache
--cache-type-v q4_0 \
-ngl 99 \ # Offload all layers to GPU
@chadbrewbaker
chadbrewbaker / README.md
Created May 25, 2026 18:23
Simple Hermes Agent profiler to diagnose API call hangs
uv run profile.py ~/.hermes/logs/agent.log
@chadbrewbaker
chadbrewbaker / interaction.c
Created May 3, 2026 20:01
Interaction calculus in C
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Raw dog Interaction Calculus demo in C
// Demonstrates superpositions (SUP), lambdas, apps, global scoping via env
// Uses function pointers for reduction rules and term kinds
typedef enum {
TERM_VAR,
@chadbrewbaker
chadbrewbaker / run_qwen.sh
Created April 28, 2026 22:32
Localhost llama.cpp configuration for Qwen3.6
./build/bin/llama-server \
-m ~/models/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf \
-ngl 99 \
-c 262144 \
-np 1 \
-fa on \
--jinja --reasoning-format deepseek \
--cache-type-k q4_0 \
--cache-type-v q4_0 \
--host 0.0.0.0
@chadbrewbaker
chadbrewbaker / iowa_offshore_2023.txt
Created April 26, 2026 03:52
FY2023 Iowa Nonprofit Offshore Activity
76 420680387 THE TRUSTEES OF GRINNELL COLLEGE $1,086,636,902.00
227 421143702 IOWA STATE UNIVERSITY FOUNDATION $292,636,627.00
491 420698295 MERCY MEDICAL CENTER $97,889,322.00
555 844628214 University of Iowa Strategic Initiatives Fund $82,026,731.00
637 426139033 COMMUNITY FDN OF GREATER DES MOINES $66,046,696.00
711 420680460 DRAKE UNIVERSITY $53,492,350.00
760 420796760 STATE UNIVERSITY OF IOWA FOUNDATION $48,287,270.00
771 510233180 MERCY HOSPITALCEDAR RAPIDSIA ENDOWMENT $47,162,540.00
815 420703280 ST AMBROSE UNIVE
@chadbrewbaker
chadbrewbaker / offshore_2023.txt
Created April 26, 2026 03:26
NONPROFIT ORGANIZATIONS BY OFFSHORE EXPENDITURES/ASSETS 2023
This file has been truncated, but you can view the full file.
ORG_EIN ORG_NAME_L1 SF_01_FRGN_REG_TOT_EXP
0 980571483 NOVO HOLDINGS AS $23,648,391,120.00
1 941156365 THE BOARD OF TRUSTEES OF THE LELAND $18,850,601,940.00
2 941105628 KAISER FOUNDATION HOSPITALS $12,144,993,009.00
3 210634501 The Trustees of Princeton University $11,728,687,974.00
4 350868188 University of Notre Dame du Lac $11,239,145,760.00
5 980593375 GAVI ALLIANCE $10,845,619,280.00
6 590735717 Howard Hughes Medical Institute $10,526,015,441.00
7 60646973 Yale Unive
@chadbrewbaker
chadbrewbaker / timeshare.txt
Created April 5, 2026 18:46
llama.cpp time sharing
Yes — this is exactly what llama-server's slot system (+ your queue dispatcher) was built for. You can run one llama-server instance on a single GGUF, expose several independent message queues, and give each queue explicit time slices that are long enough to amortize the cost of flushing the KV cache for a fresh context window. Model weights stay hot in VRAM/CPU the whole time; only per-slot KV caches are flushed/recomputed when you want a truly fresh start.Why this works (no cache thrashing)Slots = isolated KV caches: Start the server with -np N (N = number of queues + 1–2 buffer). Each slot gets its own KV cache. The server automatically assigns requests to slots (or you pin with "slot_id": X in the JSON payload). Model weights (the GGUF) are loaded once and stay cached.
Fresh context = intentional flush: In every request from a queue, include "cache_prompt": false. This forces the server to discard/recompute the KV cache for that request (fresh prefill, no reuse from prior work in the slot).
No thrashi
@chadbrewbaker
chadbrewbaker / llm.py
Created March 25, 2026 14:43
Patching tinygrad llm runner for Qwen 3.5 mamba
from __future__ import annotations
import sys, argparse, typing, re, unicodedata, json, uuid, time, functools, itertools
from dataclasses import dataclass
from tinygrad import Tensor, nn, UOp, TinyJit, getenv, function
from tinygrad.uop.ops import resolve
from tinygrad.helpers import partition, DEBUG, Timing, GlobalCounters, stderr_log, colored, Context
from tinygrad.viz.serve import TCPServerWithReuse, HTTPRequestHandler
class SimpleTokenizer:
def __init__(self, normal_tokens:dict[str, int], special_tokens:dict[str, int], preset:str="llama3"):