Skip to content

Instantly share code, notes, and snippets.

@josherich
Created June 6, 2025 03:41
Show Gist options
  • Save josherich/213c262fa249a8fd83d861e48efbe0e7 to your computer and use it in GitHub Desktop.
Save josherich/213c262fa249a8fd83d861e48efbe0e7 to your computer and use it in GitHub Desktop.

OP

Introducing Log-Linear Attention

In the realm of attention mechanisms, we are familiar with traditional Attention and its linear-time variants, such as linear attention and State Space Models. However, what exists in the intermediate space between these two paradigms? This document introduces Log-Linear Attention, a novel approach that offers significant advantages in both computational and memory efficiency.

Features of Log-Linear Attention

Log-Linear Attention is characterized by the following key features:

  • Log-linear time training
  • Log-time inference in both time and memory
  • Hardware-efficient Triton kernels

Log-Linear Attention

Context and Background

Recent advancements have focused on developing efficient alternatives that utilize sub-quadratic compute and sub-linear memory. These include:

  • Linear Attention
  • State-Space Models
  • Long Convolution Models

Despite the differences among these approaches, many can be encapsulated by a unified equation.

Unified Equation

Mechanism of Log-Linear Attention

The Log-Linear Attention mechanism imposes a specific structure on matrix M, allowing the computation cost to be log-linear while the memory cost remains logarithmic. Conceptually, this mechanism resembles a Fenwick tree-based scheme that hierarchically partitions the input into segments of power-of-two sizes.

Fenwick Tree Structure

Efficient Computation Adaptations

We demonstrate how primitives for efficient chunkwise computation of linear attention can be adapted for the log-linear case. Notably, the matrix M shows a low-rank structure in its off-diagonal blocks, which facilitates a decomposition.

Efficient Computation

Applications in Architectures

Having established log-linear attention as a hierarchical extension of linear attention, we explore how this concept can be applied to two specific architectures: Mamba-2 and Gated DeltaNet.

Mamba-2 and Gated DeltaNet

Experimental Validation

Finally, we present experiments conducted under the MQAR framework, which validate the effectiveness of Log-Linear Attention.

MQAR Experiments


In conclusion, Log-Linear Attention represents a significant advancement in the field of attention mechanisms, offering improved efficiency while maintaining the structural integrity of attention models. Further exploration and application of this approach may lead to even more innovative solutions in machine learning and artificial intelligence.

Generated by tweet-to-markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment