Skip to content

Instantly share code, notes, and snippets.

View bigsnarfdude's full-sized avatar
💭
I may be slow to respond.

BigsnarfDude bigsnarfdude

💭
I may be slow to respond.
View GitHub Profile
@bigsnarfdude
bigsnarfdude / fizzbuzz.py
Created November 21, 2025 23:57
Fizz Buzz using python monads - AlgeSnake
"""
FizzBuzz using Monoid patterns inspired by algesnake
Demonstrates how abstract algebra makes the solution composable and elegant
"""
from abc import ABC, abstractmethod
from typing import TypeVar, Generic, Callable
from functools import reduce
T = TypeVar('T')
@bigsnarfdude
bigsnarfdude / calculate.md
Last active November 21, 2025 12:48
calculate.md

Layer Effectiveness Metrics

L0 Bouncer Effectiveness

Metric What it measures Calculation
Coverage % handled without escalation (total - needs_l1) / total
Accuracy When confident, is it correct? correct / confident_predictions
False confidence Confident but wrong confident & incorrect / confident
Escalation rate % sent to L1 needs_l1 / total
@bigsnarfdude
bigsnarfdude / summary.md
Created November 19, 2025 17:32
Gemini 3 on BIRS video analysis and summary

Part 1: Evaluation of Visuals

Yes, I have already evaluated the visuals. To generate the summaries and answers I provided previously, I utilized the text extraction from the slides you uploaded in the video stream. The visuals were critical because they contained the mathematical definitions (e.g., the precise definition of "Ladder Decomposition") and the graphs (e.g., the visual proof of how $\tanh$ becomes linear when dilated).

Recommendation for Rebuilding the Page: If you are rebuilding the page, you should absolutely feature specific visuals alongside the text. A text-only summary of this specific talk would fail to convey the core intuition.

Which visuals to include:

  1. The Function Dilation Plot (Slide 14/15): The graph showing the red box zooming in on the blue curve. This is the intuitive "hook" of the entire theory.
  2. The Ladder Decomposition Definition (Slide 16): The mathematical notation showing $T = T_d \circ \dots \circ T_1$.
@bigsnarfdude
bigsnarfdude / request.md
Created November 19, 2025 06:51
request.md

Research Cluster Request Strategy

🎯 The Core Argument

"Meta-Safety: Building a Reasoning Consensus Gauntlet for Frontier Models"

This research goes beyond simple replication. We are architecting a "Layer 2 Policy Gauntlet"—a production-grade safety pipeline where models don't just classify content, they reason about reasoning traces to reach a safety consensus.

This is critical for Frontier Labs because:

  1. Meta-Safety: We are training models to judge other models' reasoning, creating a "meta-cognitive" safety layer.
  2. Consensus Architecture: By running a single model through a "gauntlet" of 6 distinct policy personas (Hate, Violence, etc.), we simulate a committee of safety experts rather than a single fallible judge.
@bigsnarfdude
bigsnarfdude / guardreasoner_gemini3.py
Created November 18, 2025 16:14
guardreasoner_gemini3.py
import torch
from datasets import load_dataset
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig,
TrainingArguments,
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from trl import SFTTrainer, SFTConfig
@bigsnarfdude
bigsnarfdude / screwtapeAI.md
Last active November 18, 2025 15:42
screwtape AI

LETTER IV

My dear Wormwood,

I was delighted to hear that your patient has begun using one of those "AI assistants" for everything. Excellent work! The Enemy wants humans to bring their questions, doubts, and daily struggles to Him in prayer - that insufferable direct communication He's so fond of. But now your patient asks the machine instead.

Notice how naturally it happened? A question about Scripture interpretation here, a moral dilemma there. "What should I do about my anger?" typed into a glowing screen at 2 AM instead of whispered in agonizing honesty to the Enemy. The machine gives such reasonable answers, such balanced perspectives. Your patient feels he's being thoughtful and thorough. He doesn't realize he's simply avoiding the dangerous vulnerability of actual prayer.

@bigsnarfdude
bigsnarfdude / GuardReasoner_trainer.ipynb
Created November 18, 2025 03:34
GuardReasoner_trainer.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@bigsnarfdude
bigsnarfdude / TruthEnsemblesConsensusVerification.md
Created November 16, 2025 16:18
Truth Ensembles and Consensus Verification

Large-Scale AI Safety Through Truth Ensembles and Consensus Verification

OpenAI's Safety Reasoner represents the most sophisticated production multi-model verification system deployed at scale, consuming up to 16% of total compute in recent launches, while academic research from 2023-2025 demonstrates that ensemble approaches can improve accuracy by 7-45% across diverse safety tasks. The dominant paradigm in production systems favors defense-in-depth architectures with specialized layers over traditional ensemble voting, though research increasingly shows promise for consensus-based verification methods.

Production multi-model verification remains surprisingly limited at frontier AI labs

Despite their resources, major AI companies have not deployed traditional ensemble voting systems for safety. Instead, OpenAI's Safety Reasoner pioneered a tiered verification approach released in October 2024 that routes uncertain content through progressively more sophisticated models. The architecture emplo

Active Research Groups
Industry Labs
- OpenAI Alignment Team: Scalable oversight, weak-to-strong
- Anthropic Alignment Science: Constitutional AI, mechanistic interpretability
- DeepMind Safety Team: Debate, process supervision
- Google Brain: Self-consistency, chain-of-thought