CLOct 21, 2025

Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference

arXiv:2510.18413v12 citationsh-index: 4
Originality Highly original
AI Analysis

This addresses latency issues in long-context applications like document summarization and multi-turn dialogue, offering a significant improvement over prior sparse methods.

The paper tackles the quadratic cost of self-attention in large language models for long-context inference by introducing Adamas, a sparse attention mechanism that matches full attention accuracy with a 64-token budget and achieves up to 4.4x self-attention speedups on 32K-length sequences.

Large language models (LLMs) now support context windows of hundreds of thousands to millions of tokens, enabling applications such as long-document summarization, large-scale code synthesis, multi-document question answering and persistent multi-turn dialogue. However, such extended contexts exacerbate the quadratic cost of self-attention, leading to severe latency in autoregressive decoding. Existing sparse attention methods alleviate these costs but rely on heuristic patterns that struggle to recall critical key-value (KV) pairs for each query, resulting in accuracy degradation. We introduce Adamas, a lightweight yet highly accurate sparse attention mechanism designed for long-context inference. Adamas applies the Hadamard transform, bucketization and 2-bit compression to produce compact representations, and leverages Manhattan-distance estimation for efficient top-k selections. Experiments show that Adamas matches the accuracy of full attention with only a 64-token budget, achieves near-lossless performance at 128, and supports up to 8x higher sparsity than prior state-of-the-art (SOTA) methods while delivering up to 4.4x self-attention and 1.5x end-to-end speedups on 32K-length sequences. Remarkably, Adamas attains comparable or even lower perplexity than full attention, underscoring its effectiveness in maintaining accuracy under aggressive sparsity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes