The Sphere Keeps Showing Up: How TurboQuant, Qubits, and Autoresearch Point Toward a Geometric Theory of Information

A research diary / open question / call for experiments

The Observation That Started This

On March 24, 2026, Google Research published TurboQuant — a compression algorithm that reduces large language model weights to 3-4 bits with zero accuracy loss and up to 8x performance gains on H100 GPUs. The headline numbers are impressive, but the mechanism is what matters here.

TurboQuant's core component, PolarQuant, works by converting weight vectors from standard Cartesian coordinates (X, Y, Z values) into polar coordinates — a radius capturing magnitude and a set of angles capturing direction. The key discovery: when you do this, the angles become highly concentrated and predictable. The information wants to live on a sphere. The researchers didn't force spherical geometry onto the data — they found it was already there, hidden inside 32-bit floats where five sixths of the bits were doing no useful work.

The second component, QJL (Quantized Johnson-Lindenstrauss), compresses the residual error down to a single sign bit per element: +1 or −1. Zero memory overhead. Together, TurboQuant achieves 6x compression on the key-value cache while maintaining full model accuracy across every benchmark tested.

The Quantum Rhyme

Here's where it gets interesting. The mathematical structure of PolarQuant is strikingly parallel to how quantum computing represents information.

A qubit — the fundamental unit of quantum information — is represented on the Bloch sphere using two angles: θ (polar) and φ (azimuthal). The state is defined by its direction on the sphere, not by Cartesian coordinates. The radius is fixed at 1 (normalization constraint). Quantum gates are rotations on this sphere. Measurement collapses the state to a binary outcome: 0 or 1.

Now compare:

	Qubit (Bloch Sphere)	TurboQuant (PolarQuant + QJL)
Representation	Polar coordinates on unit sphere	Polar coordinates on data manifold
Primary info carrier	Angles (θ, φ)	Angles (concentrated, predictable)
Magnitude	Fixed (normalized to 1)	Stored separately as radius
Operations	Rotations (quantum gates)	Random rotations (data preprocessing)
Binary output	Measurement → 0 or 1	QJL sign bit → +1 or −1
Why spheres?	Probabilities must sum to 1 (	α

The parallel is not that TurboQuant is doing quantum computing. The physics are fundamentally different — qubits exploit superposition and entanglement, which are genuinely non-classical phenomena. TurboQuant is deterministic and classical.

The parallel is deeper: both systems independently converge on polar/spherical geometry as the most efficient encoding for high-dimensional information under constraints. Quantum mechanics arrives there because the Born rule (probabilities sum to 1) forces states onto a sphere. Neural networks arrive there because learned representations naturally concentrate on low-dimensional spherical manifolds, as TurboQuant empirically demonstrates.

This suggests that spherical geometry isn't a convenient choice — it may be the natural coordinate system of constrained, normalized, high-dimensional information, regardless of whether that information is quantum or classical.

The Question Nobody Has Asked

TurboQuant proves that the output of training — the final learned weights — lives on a spherical manifold. But training itself is still done entirely in Cartesian coordinates. Every forward pass, backward pass, and gradient update operates on 32-bit floats in a rectilinear coordinate system.

This is like knowing your destination is on a sphere but insisting on navigating with a flat map. You'll get there eventually, but you're wasting enormous effort on coordinate translations that serve no purpose.

What if we trained natively in polar coordinates?

If the destination is spherical, the optimization path might be dramatically shorter and more efficient in spherical coordinates. Potential implications:

Smaller memory footprint during training — polar representations are inherently more compact
Faster convergence — the optimizer searches over bounded, periodic angles instead of unbounded floats
Fewer GPUs needed — more of the model fits in memory; gradients are cheaper to compute
Natural regularization — the spherical constraint is built into the parameterization, not imposed externally

This is speculative. There are real engineering challenges: numerical stability of angular representations, the need for Riemannian optimizers that respect curved geometry, custom CUDA kernels, and the question of whether gradient flow behaves well in polar coordinates. Many of these specific challenges might fail on first attempt.

But the TurboQuant result provides an unusually strong empirical anchor. We know the answer lives on a sphere. The question is whether we can take a spherical path to get there.

A Possible Biological Connection

It's worth noting that biological neural networks may have already solved this problem. Neurons fire or don't fire — a binary sign bit. Neural populations encode information as directions in high-dimensional activation space — angular encoding. The brain operates at roughly 20 watts, compared to megawatts for GPU clusters training frontier models.

If biological neural networks natively compute in something resembling polar/spherical geometry — and their extraordinary energy efficiency is partly a consequence of this — then TurboQuant may be rediscovering a principle that evolution found billions of years ago.

This is highly speculative and not directly testable with current neuroscience tools. But it's a suggestive pattern.

Enter Autoresearch: How to Actually Test This

On March 7, 2026, Andrej Karpathy released autoresearch — a system that lets an AI agent autonomously run ML experiments in a loop. The agent modifies training code, trains for 5 minutes, evaluates the result, keeps improvements, and repeats. It runs approximately 100 experiments overnight on a single GPU.

This is the perfect tool for exploring polar native training. The search space is large (parameterization choices, optimizer selection, initialization strategies, handling of radius vs. angle components), and most individual experiments will fail. A human researcher might try 2-3 variations before getting discouraged. Autoresearch tries 100 without getting tired.

Proposed Experiment

Setup: Fork autoresearch. Modify the RESEARCH.md to direct the agent toward spherical/polar training explorations.

Suggested research directions for the agent:

Represent model weights as radius-angle pairs instead of Cartesian floats
Initialize weights on a hypersphere rather than using standard Gaussian initialization
Implement or adapt a Riemannian optimizer (e.g., Riemannian SGD or Riemannian Adam) that respects spherical geometry
Measure whether convergence speed improves when the parameterization matches the known destination geometry
Compare validation bits-per-byte (val_bpb) against the standard Cartesian baseline
Explore hybrid approaches: Cartesian for some layers, polar for others

Metric: val_bpb (same as standard autoresearch). Keep anything that improves over the Cartesian baseline.

Expected outcome: Most experiments will fail or regress. But if even one configuration shows improvement, it validates the hypothesis that the optimization path benefits from matching the destination geometry. That single result would be worth publishing.

The Bigger Picture

If this works — even partially — it implies something about the nature of learned representations that goes beyond any single compression trick or training optimization.

It would suggest that there exists a geometric theory of information — a framework that explains why normalized, constrained, high-dimensional information naturally organizes on spheres regardless of substrate (quantum, classical neural network, biological neuron). Such a theory would:

Unify insights from quantum information theory and deep learning
Predict which coordinate systems are optimal for which types of computation
Potentially explain what kinds of problems are learnable — perhaps the learnable problems are precisely those whose solutions live on spherical manifolds
Provide a mathematical foundation for understanding why certain architectures work and others don't

This is admittedly ambitious and speculative. But TurboQuant provides empirical grounding, the Bloch sphere provides theoretical precedent from physics, and autoresearch provides a practical means of testing the hypothesis tonight.

How to Get Involved

This is an open research direction. No credentials required — just a GPU and curiosity.

Run the experiment: Fork autoresearch, add polar training directions, run overnight, report results
Poke holes: If this reasoning is wrong, explain why. That's equally valuable.
Extend the theory: Are there other fields where spherical geometry emerges as optimal for information encoding? (Signal processing? Information geometry? Category theory?)
Connect the literature: There's existing work on Riemannian optimization, hyperspherical learning, and quantum-inspired classical algorithms. Someone with deeper background in these areas could map exactly where this hypothesis sits relative to known results.

The worst case is that we learn something specific about why polar training doesn't work in its naive form. The best case is that we find a thread that connects quantum mechanics, neural network compression, and the fundamental geometry of learning.

The sphere keeps showing up. Maybe it's time we asked why.

This writeup emerged from a conversation exploring the connections between Google Research's TurboQuant (March 2026), quantum information theory, and Karpathy's autoresearch framework. It is not a formal paper — it's an invitation to experiment.

Key references:

TurboQuant (Zandieh & Mirrokni, 2026)
PolarQuant (2025)
Quantized Johnson-Lindenstrauss (2024)
Autoresearch (Karpathy, 2026)
Bloch sphere / quantum state representation (standard quantum information theory)

zinglax/spherical-information-geometry.md

Select an option

No results found