LGAIMLFeb 5

ZeroS: Zero-Sum Linear Attention for Efficient Transformers

arXiv:2602.05230v13 citationsh-index: 4
Originality Highly original
AI Analysis

This addresses efficiency and performance issues in Transformers for sequence modeling tasks, representing a novel method rather than an incremental improvement.

The paper tackled the underperformance of linear attention methods in Transformers by proposing Zero-Sum Linear Attention (ZeroS), which removes limitations like convex combinations and uniform weight bias, enabling contrastive operations and matching or exceeding standard softmax attention performance across benchmarks.

Linear attention methods offer Transformers $O(N)$ complexity but typically underperform standard softmax attention. We identify two fundamental limitations affecting these approaches: the restriction to convex combinations that only permits additive information blending, and uniform accumulated weight bias that dilutes attention in long contexts. We propose Zero-Sum Linear Attention (ZeroS), which addresses these limitations by removing the constant zero-order term $1/t$ and reweighting the remaining zero-sum softmax residuals. This modification creates mathematically stable weights, enabling both positive and negative values and allowing a single attention layer to perform contrastive operations. While maintaining $O(N)$ complexity, ZeroS theoretically expands the set of representable functions compared to convex combinations. Empirically, it matches or exceeds standard softmax attention across various sequence modeling benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes