Skip to content

Instantly share code, notes, and snippets.

View DSamuelHodge's full-sized avatar
:electron:
State of usual brilliance

DSamuelHodge DSamuelHodge

:electron:
State of usual brilliance
View GitHub Profile
@DSamuelHodge
DSamuelHodge / topology_filtration_semantic_ring.py
Created July 21, 2025 16:42
Code for topology filtration of attention using GUDHI.
# === Imports ===
import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoModel, AutoTokenizer
import gudhi as gd
from sklearn.manifold import MDS
import warnings
warnings.filterwarnings("ignore")
# Topological Reasoning in Transformers: Semantic Loop Analysis
# Implementation of "Beyond Reinforcement Learning" - Geometric Theory of Transformer Reasoning
"""
ABSTRACT:
We introduce a geometric theory of reasoning in Transformer models based on attention-induced
topological structures. This notebook demonstrates that reasoning emerges from closed, high-energy
attention loopssemantic circuits measurable through loop energy, holonomy, and attention geometry.
This topological reasoning model enables prompt design and evaluation without external reward policies.

Topological Reasoning in Transformers: Beyond Reinforcement Learning

Abstract

We introduce a geometric theory of reasoning in Transformer models based on attention-induced topological structures. Contrary to reinforcement learning-based paradigms that impose reasoning via reward optimization, we demonstrate that reasoning naturally emerges from closed, high-energy attention loops—semantic circuits measurable through loop energy, holonomy, and Ricci curvature. This topological reasoning model enables prompt design, evaluation, and model alignment without external reward policies.


1. Introduction

@DSamuelHodge
DSamuelHodge / thermodynamic_analyzer.py
Created May 18, 2025 13:49
Thermodynamics Analyzer: Analyzes language models through the lens of statistical thermodynamics. Calculates temperature (weight/gradient norm ratio), entropy (from singular values), energy (curvature), and derived metrics across layers. Identifies potential phase transitions using susceptibility (dG/dT), compressibility (dS/dF), and inter-laye…
import torch
import numpy as np
import pandas as pd
from scipy.linalg import svdvals
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import logging as transformers_logging
import logging
# ──────────────────── CONFIGURATION PARAMETERS ────────────────────────────────
MODEL_NAME = "Qwen/Qwen2.5-0.5B"

Core AEG Components in the Code

  1. Complex Normalization Function
def normalize(self, z):
    rho = th.abs(z)
    theta = th.atan2(th.imag(z), th.real(z))
    return th.tanh(rho) * (th.cos(theta) + 1.0j * th.sin(theta))

This function maps complex values onto a curved manifold by:

@DSamuelHodge
DSamuelHodge / HTSR_Theory_Notion_Tables.md
Last active February 23, 2025 03:19
Understanding of Deep Learning through Heavy-Tailed Self-Regularization Theory by Charles Martin, PhD.

Acronyms and Notation Tables

Table 1: Definitions of acronyms used in HTSR

Acronym Description
DNN Deep Neural Network
ML Machine Learning
SGD Stochastic Gradient Descent
RMT Random Matrix Theory
@DSamuelHodge
DSamuelHodge / ww_advanced_usage.md
Last active February 17, 2025 21:37
WeightWatcher advanced features into logical categories to make it easier to find specific functionalities when you need them.

WeightWatcher Advanced Usage Cheatsheet

🔍 Basic Analysis

Feature Command Description
Analyze Model Layers watcher.analyze() Analyze model layers for generalization, spectral properties, and overtraining.
Describe Model watcher.describe(model=model) Get model details without analyzing it.
Plot and Fit ESD watcher.analyze(plot=True) Plot the Empirical Spectral Density (ESD) of model layers and apply fits.
Generate Summary Statistics summary = watcher.get_summary() Generate summary statistics from analysis results to compare models.
@DSamuelHodge
DSamuelHodge / deepseek-grpo-explainer.py
Created February 14, 2025 17:35
GRPO equation explained in Manim
from manim import *
class GRPOExplanation(MovingCameraScene):
def construct(self):
# Title
title = Text("DeepSeek-R1 Reinforcement Learning", font_size=36)
subtitle = Text("Group Relative Policy Optimization (GRPO)", font_size=24)
title_group = VGroup(title, subtitle).arrange(DOWN, buff=0.5)
title_group.to_edge(UP, buff=1)
@DSamuelHodge
DSamuelHodge / replace_attention_layers.py
Created November 6, 2024 03:52
Replaces standard LlamaAttention layers with Differential SPDA Attention layers in a Llama model.
import os
import math
from typing import List, Optional, Tuple, Union
import torch
import torch.nn.functional as F
from torch import nn
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
from transformers.models.llama.modeling_llama import (
@DSamuelHodge
DSamuelHodge / LlamaDiffSdpaAttention.py
Last active November 25, 2024 19:15
Differential Transformer Attention mechanism for Meta Llama 3.2 1B (Instruct).
'''
Differential Transformer Attention mechanism for Llama models.
This implementation replaces the standard LlamaSpdaAttention with a differential attention
mechanism that computes attention scores as the difference between two separate softmax
attention maps. This approach helps reduce noise and creates sparse attention patterns,
leading to improved performance in various NLP tasks.
Implementation based on research by:
Ye, T., Dong, L., Xia, Y., Sun, Y., Zhu, Y., Huang, G., & Wei, F. (2024)