CLAIOct 25, 2025

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

arXiv:2510.22109v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This provides a simple method for enhancing long-range memory in transformers, which is incremental as it modifies input representation rather than architecture.

The paper tackled the problem of extending transformer context windows by applying logarithmic compression to input tokens, resulting in reduced perplexity on WikiText-103 and PG-19 benchmarks and improved performance with longer contexts.

Most approaches to long-context processing increase the complexity of the transformer's internal architecture by integrating mechanisms such as recurrence or auxiliary memory modules. In this work, we introduce an alternative approach that modifies the input representation itself, rather than the transformer architecture. Inspired by cognitive models of human memory, our method applies a scale-invariant logarithmic compression to the input tokens. The resulting compressed representation is processed by a standard, unmodified transformer, preserving architectural simplicity. We evaluate this approach on the WikiText-103 and PG-19 language modeling benchmarks, showing a reduction in perplexity compared to uncompressed baselines. Moreover, performance improves consistently with longer compressed temporal contexts, showing that input-level logarithmic compression is a simple and effective way to extend a transformer's long-range memory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes