LGDec 12, 2025

Sliced ReLU attention: Quasi-linear contextual expressivity via sorting

arXiv:2512.11411v21 citationsh-index: 13
Originality Highly original
AI Analysis

This addresses computational bottlenecks in attention mechanisms for long sequences, though it appears incremental as it builds on existing attention paradigms with a novel sorting-based approach.

The authors introduced sliced ReLU attention, a new attention mechanism that achieves quasi-linear O(n log(n)) complexity through sorting, making it suitable for long contexts while preserving theoretical expressive power comparable to softmax attention.

We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and its approximation alternatives. Instead of applying a nonlinearity to pairwise dot products, we operate on one-dimensional projections of key--query differences and leverage sorting to obtain quasi-linear complexity. This construction yields a differentiable, non-symmetric kernel that can be computed in O(n log(n)) through a sorting procedure, making it suitable for very long contexts. Beyond computational benefits, the model retains strong theoretical expressive power: we establish two in-context expressivity results, previously known for softmax attention, showing that sliced ReLU attention preserves the ability to perform nontrivial sequence-to-sequence disentangling tasks and satisfies a contextual universal approximation property. Finally, we illustrate the potential practical interest of this kernel in small to medium-scale experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes