LGSep 24, 2025

Myosotis: structured computation for attention like layer

Evgenii Egorov, Hanno Ackermann, Markus Nagel, Hong Cai

arXiv:2509.20503v17.11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses a bottleneck in sequence modeling for AI applications, but appears incremental as it builds on prior methods like SSM and sparsity techniques.

The paper tackles the quadratic scaling of attention layers in memory and computation by proposing a novel algorithm that combines sparsity and recurrent dependence through efficient inversion of tree-structured matrices, aiming to mitigate disadvantages of existing approaches.

Attention layers apply a sequence-to-sequence mapping whose parameters depend on the pairwise interactions of the input elements. However, without any structural assumptions, memory and compute scale quadratically with the sequence length. The two main ways to mitigate this are to introduce sparsity by ignoring a sufficient amount of pairwise interactions or to introduce recurrent dependence along them, as SSM does. Although both approaches are reasonable, they both have disadvantages. We propose a novel algorithm that combines the advantages of both concepts. Our idea is based on the efficient inversion of tree-structured matrices.

View on arXiv PDF

Similar