LGAIApr 18, 2025

CacheFormer: High Attention-Based Segment Caching

arXiv:2504.13981v11 citationsh-index: 24AI
Originality Incremental advance
AI Analysis

This work addresses the challenge of reducing quadratic time complexity in attention mechanisms for long contexts in language models, which is an incremental improvement over existing methods.

The paper tackles the problem of efficiently handling long contexts in transformer-based language models by proposing CacheFormer, a method that divides long contexts into segments and retrieves uncompressed nearby segments based on high segment-level attention, resulting in an average perplexity improvement of 8.5% over similar model sizes.

Efficiently handling long contexts in transformer-based language models with low perplexity is an active area of research. Numerous recent approaches like Linformer, Longformer, Performer, and Structured state space models (SSMs)., have not fully resolved this problem. All these models strive to reduce the quadratic time complexity of the attention mechanism while minimizing the loss in quality due to the effective compression of the long context. Inspired by the cache and virtual memory principle in computers, where in case of a cache miss, not only the needed data is retrieved from the memory, but the adjacent data is also obtained, we apply this concept to handling long contexts by dividing it into small segments. In our design, we retrieve the nearby segments in an uncompressed form when high segment-level attention occurs at the compressed level. Our en-hancements for handling long context include aggregating four attention mechanisms consisting of short sliding window attention, long compressed segmented attention, dynamically retrieving top k high attention uncompressed segments, and overlapping segments in long segment attention to avoid segment fragmentation. These enhancements result in an architecture that outperforms ex-isting SOTA architectures with an average perplexity improvement of 8.5% over similar model sizes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes