LGAISep 4, 2025

Attention as an Adaptive Filter

arXiv:2509.04154v31 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for machine learning researchers working on attention-based models, offering a novel mathematical formulation but limited practical impact.

The authors tackled the problem of improving attention mechanisms by introducing Adaptive Filter Attention (AFA), which models input sequences as linear stochastic differential equations to compute attention weights, resulting in a method with computational complexity similar to standard attention and recovering dot-product attention in specific limits.

We introduce Adaptive Filter Attention (AFA), a novel attention mechanism that incorporates a learnable dynamics model directly into the computation of attention weights. Rather than comparing queries and keys directly, we model the input sequence as discrete observations of a linear stochastic differential equation (SDE). By assuming a continuous-time linear time-invariant system with simultaneously-diagonalizable state matrices and noise covariances, we can make use of a closed-form solution of the differential Lyapunov equation to efficiently propagate uncertainties through the dynamics from keys to queries. A generalization of attention naturally arises as the maximum likelihood solution for filtering the trajectory of this linear SDE, with attention weights corresponding to robust residual-based reweightings of the propagated query-key precisions. We further constrain the system dynamics and noise in order to obtain a simplified variant with the same computational and memory complexity as standard attention. In the limit of zero decay and process noise, and using a small-angle approximation, we recover a complex-valued generalization of ordinary dot-product attention with rotary positional encodings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes