We introduce a geometric theory of reasoning in Transformer models based on attention-induced topological structures. Contrary to reinforcement learning-based paradigms that impose reasoning via reward optimization, we demonstrate that reasoning naturally emerges from closed, high-energy attention loops—semantic circuits measurable through loop energy, holonomy, and Ricci curvature. This topological reasoning model enables prompt design, evaluation, and model alignment without external reward policies.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # === Imports === | |
| import torch | |
| import numpy as np | |
| import matplotlib.pyplot as plt | |
| from transformers import AutoModel, AutoTokenizer | |
| import gudhi as gd | |
| from sklearn.manifold import MDS | |
| import warnings | |
| warnings.filterwarnings("ignore") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Topological Reasoning in Transformers: Semantic Loop Analysis | |
| # Implementation of "Beyond Reinforcement Learning" - Geometric Theory of Transformer Reasoning | |
| """ | |
| ABSTRACT: | |
| We introduce a geometric theory of reasoning in Transformer models based on attention-induced | |
| topological structures. This notebook demonstrates that reasoning emerges from closed, high-energy | |
| attention loops—semantic circuits measurable through loop energy, holonomy, and attention geometry. | |
| This topological reasoning model enables prompt design and evaluation without external reward policies. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import torch | |
| import numpy as np | |
| import pandas as pd | |
| from scipy.linalg import svdvals | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from transformers import logging as transformers_logging | |
| import logging | |
| # ──────────────────── CONFIGURATION PARAMETERS ──────────────────────────────── | |
| MODEL_NAME = "Qwen/Qwen2.5-0.5B" |
| Feature | Command | Description |
|---|---|---|
| Analyze Model Layers | watcher.analyze() |
Analyze model layers for generalization, spectral properties, and overtraining. |
| Describe Model | watcher.describe(model=model) |
Get model details without analyzing it. |
| Plot and Fit ESD | watcher.analyze(plot=True) |
Plot the Empirical Spectral Density (ESD) of model layers and apply fits. |
| Generate Summary Statistics | summary = watcher.get_summary() |
Generate summary statistics from analysis results to compare models. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from manim import * | |
| class GRPOExplanation(MovingCameraScene): | |
| def construct(self): | |
| # Title | |
| title = Text("DeepSeek-R1 Reinforcement Learning", font_size=36) | |
| subtitle = Text("Group Relative Policy Optimization (GRPO)", font_size=24) | |
| title_group = VGroup(title, subtitle).arrange(DOWN, buff=0.5) | |
| title_group.to_edge(UP, buff=1) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import math | |
| from typing import List, Optional, Tuple, Union | |
| import torch | |
| import torch.nn.functional as F | |
| from torch import nn | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig | |
| from transformers.models.llama.modeling_llama import ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ''' | |
| Differential Transformer Attention mechanism for Llama models. | |
| This implementation replaces the standard LlamaSpdaAttention with a differential attention | |
| mechanism that computes attention scores as the difference between two separate softmax | |
| attention maps. This approach helps reduce noise and creates sparse attention patterns, | |
| leading to improved performance in various NLP tasks. | |
| Implementation based on research by: | |
| Ye, T., Dong, L., Xia, Y., Sun, Y., Zhu, Y., Huang, G., & Wei, F. (2024) |
NewerOlder