CLDec 3, 2025

Nexus: Higher-Order Attention Mechanisms in Transformers

arXiv:2512.03377v2h-index: 14
AI Analysis

This addresses a bottleneck in Transformers for capturing intricate relationships, offering a parameter-efficient solution with broad applicability in AI domains.

The paper tackles the limitation of standard first-order attention in Transformers by proposing Nexus, a recursive architecture that enhances representational power through nested self-attention mechanisms, achieving improved performance on multiple benchmarks.

Transformers have achieved significant success across various domains, relying on self-attention to capture dependencies. However, the standard first-order attention mechanism is often limited by a low-rank bottleneck, struggling to capture intricate, multi-hop relationships within a single layer. In this paper, we propose the Nexus, a novel architecture designed to enhance representational power through a recursive framework. Unlike standard approaches that use static linear projections for Queries and Keys, Nexus dynamically refines these representations via nested self-attention mechanisms. Specifically, the Query and Key vectors are themselves outputs of inner attention loops, allowing tokens to aggregate global context and model high-order correlations \textit{prior} to the final attention computation. We enforce a parameter-efficient weight-sharing strategy across recursive steps, ensuring that this enhanced expressivity incurs $\mathcal{O}(1)$ additional parameters. We provide theoretical analysis demonstrating that our method breaks the linear bottleneck of standard attention. Empirically, Nexus outperforms standard Transformers on multiple benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes