LGAICVAug 16, 2025

ENA: Efficient N-dimensional Attention

arXiv:2508.11921v1
Originality Incremental advance
AI Analysis

This addresses the problem of handling ultra-long high-order data for machine learning applications, representing an incremental improvement over existing methods.

The paper tackles efficient modeling of long sequences of high-order data by proposing ENA, a hybrid architecture combining linear recurrence and tiled high-order sliding window attention, which yields promising results as a practical solution.

Efficient modeling of long sequences of high-order data requires a more efficient architecture than Transformer. In this paper, we investigate two key aspects of extending linear recurrent models, especially those originally designed for language modeling, to high-order data (1D to ND): scanning strategies and attention-hybrid architectures. Empirical results suggest that scanning provides limited benefits, while attention-hybrid models yield promising results. Focusing on the latter, we further evaluate types of attention and find that tiled high-order sliding window attention (SWA) is efficient in both theory and practice. We term the resulting hybrid architecture of linear recurrence and high-order SWA as Efficient N-dimensional Attention (ENA). We then conduct several experiments to demonstrate its effectiveness. The intuition behind ENA is that linear recurrence compresses global information into a state, while SWA complements it by enforcing strict local modeling. Together, they form a simple framework that offers a promising and practical solution for ultra-long high-order data modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes