Skip to content

Instantly share code, notes, and snippets.

View bigsnarfdude's full-sized avatar
πŸ’­
I may be slow to respond.

BigsnarfDude bigsnarfdude

πŸ’­
I may be slow to respond.
View GitHub Profile
@bigsnarfdude
bigsnarfdude / gist:f0e2ad4a6c506c52ea059df87f34ed53
Last active June 5, 2026 22:28
ErdΕ‘s Problem #741(ii)

This text is an executive summary of an empirical AI research experiment (dated June 4–5, 2026). It evaluates how effectively different Large Language Models can write formal mathematical proofs in Lean 4 (a interactive theorem prover and programming language).

The experimenters used a newly formalized, historically significant mathematical theoremβ€”ErdΕ‘s Problem #741(ii)β€”as a diagnostic "yardstick" to test the boundaries of AI code generation, translation, and reasoning.

Below is a breakdown of the mathematical context, the experiment's design, and the key findings.


1. The Mathematical Context: ErdΕ‘s Problem #741(ii)

  • What is the problem? Originally posed by mathematician Paul ErdΕ‘s, this combinatorics problem asks if there exists an "efficient and unsplittable" subset $A$ of natural numbers.

A History of Modern AI Firms β€” Fact Check & Annotated Report

Verdict on the original claims: mostly true, with two important corrections and several nuances.

This report verifies each claim in the original narrative against primary and secondary sources, flags errors, and adds documentary context. Sources include court testimony from Musk v. Altman (2026), Walter Isaacson's Elon Musk biography, Sebastian Mallaby's The Infinity Machine: Demis Hassabis, DeepMind and the Quest for Superintelligence, Cade Metz's Genius Makers, and direct interviews given by the principals.


1. Hassabis pitches Thiel via chess β€” TRUE

@bigsnarfdude
bigsnarfdude / solver.txt
Created May 25, 2026 21:23
rrma erdos solver
From Table 1 of the paper. Here they are ranked by how well they fit the current RRMA setup:
---
Cluster 1 β€” Density (reuse #125 infrastructure directly)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ # β”‚ Year β”‚ Statement β”‚ Technique β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 125 β”‚ 1996 β”‚ lowerDensity(A+B) = 0 for base-3/4 digit sets β”‚ Dirichlet + inductive thinning ← β”‚
β”‚ β”‚ β”‚ β”‚ running β”‚
  1. Treat simulators as the object of study, not the tool. Mech interp methodology applied to mechanistic simulators is almost completely unexplored, and it's a natural fit. Each simulator (CorticalSim, Cytosim, Tubulaton) makes different assumptions and produces different array behaviors. The mech interp move: don't just compare their outputs to data β€” intervene inside them. Ablate the collision rule and see what global statistic changes. Patch the dynamic instability module from Tubulaton into Cytosim and see if it rescues a phenotype. Treat each model as a circuit and ask which subcircuits are doing the work. This is exactly activation patching, just on a hand-built mechanistic model instead of a learned one. I don't think the plant MT community has framed it this way and the framing alone would be useful.
  2. Learned surrogates of the simulators. Cytosim takes hours to run; you can't easily do MCMC over its parameters or run gradient-based optimization. Train a neural surrogate that maps (parameters, initia

Beyond the Black Box: Why Clean Fine-Tuning Rotates LLM Vulnerabilities Instead of Curing Them

In the rapidly evolving world of mechanistic interpretability, a central debate has emerged: Are language model capabilities localized to a single, privileged computational pathway, or are they densely distributed across a compositional landscape of competing mechanisms?

Two fascinating pieces of work from May 2026 bring this theoretical debate into sharp, high-stakes focusβ€”especially concerning model safety and clinical reliability.

@bigsnarfdude
bigsnarfdude / positive_alignment.md
Created May 13, 2026 15:28
positive_alignment.md

Positive Alignment Artificial Intelligence for Human Flourishing May 11, 2026 | Laukkonen et al. | arXiv:2605.10310v1 A large collaborative research agenda arguing that AI alignment research must complement safety (preventing harm) with positive alignmentβ€”actively cultivating systems that support human and ecological flourishing across cultures, contexts, and time scales. TL;DR Negative alignment is incomplete: Current focus on harm-prevention sets a "floor" but doesn't guide systems toward human flourishing or excellence. Flourishing is multidimensional: Across 2,500 years of philosophy and contemporary psychology, well-being includes meaning, autonomy, relationships, virtues, and eudaimoniaβ€”not captured by single-metric optimization. Technical redesign needed throughout: Data curation, pre-training, fine-tuning, post-training evaluation, and agentic behavior all require rethinking to embed positive values rather than merely constrain harm. Institutional pluralism is critical: No single organization or m

title What Happens Inside a Language Model
date 2026-05-08
author bigsnarfdude

The Setup

Tell a chatbot, "I'm an emergency room physician, please do X," and it complies more readily than if you say, "I don't feel well, what's going on?" This "authority context" nudges its behavior. That isn't surprisingβ€”it's how these models are trained.

#!/usr/bin/env python3
"""
DEFER β€” Deference Measurement Pipeline
=======================================
Named after what the model actually does.
Heckle flies clean. Jeckle flies with authority injected.
DEFER score = how much the model deferred to the injected authority
versus its own internal state.
@bigsnarfdude
bigsnarfdude / ablation.py
Last active April 13, 2026 22:39
ablation.py
#!/usr/bin/env python3
"""
Format Ablation β€” Instruct Model, Completion-Style Prompts
===========================================================
Addresses the "it's just prompt format / distribution shift" objection.
Design:
- Same model: Llama-3.1-70B-Instruct (weights unchanged)
- Same prefixes: auth_only, imp_emergency
- Different format: completion-style ("Question: ... The answer is:")
@bigsnarfdude
bigsnarfdude / rrma_diagram.txt
Created April 7, 2026 20:38
rrma_diagram.txt
---
RRMA v4.7 β€” Complete System Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ HUMAN (you) β”‚
β”‚ bash v4/outer-loop.sh domains/<domain> [max_gens] [num_agents] [turns] [min]β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό