LGMay 7

Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators

arXiv:2605.0699755.8

AI Analysis

This work addresses the memory bottleneck in long-context Transformer inference and the retrieval failure in state-space models, offering a constant-memory solution for associative recall tasks.

Echo introduces Spectral Koopman Attention (SKA), a KV-cache-free associative recall mechanism that augments state-space models with a closed-form dynamical operator, achieving 100% retrieval accuracy on the Multi-Query Associative Recall benchmark across all gap lengths and KV-pair counts, while pure Mamba-2 SSM fails to exceed chance accuracy (~3%).

Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architecture built around Spectral Koopman Attention (SKA); a drop-in replacement for attention layers that augments SSM blocks with a closed-form dynamical operator whose sufficient statistics are accumulated in constant memory with no KV cache. Echo fits a spectral linear system to the key and value history via kernel ridge regression and retrieves through a learned power-iterated filter, all from $O(r^{2})$ streaming state where $r$ is a small projection rank. On the Multi-Query Associative Recall benchmark, a pure Mamba-2 SSM fails to exceed chance accuracy (${\sim}3\%$) across all gap lengths and KV-pair counts, while at the 50M parameter scale SKA-augmented models achieve $100\%$ retrieval accuracy on every configuration tested, including distractor gaps of $4{,}096$ tokens with $32$ KV pairs. Across five additional transfer benchmarks including needle-in-a-haystack, tool-trace, and multi-hop retrieval, SKA consistently outperforms both pure SSM and SSM+Attention hybrids while maintaining constant inference memory. Ablations confirm that the spectral operator, not the prefix masking strategy, drives the retrieval gain.

View on arXiv PDF

Similar