LGAug 23, 2025

Learned Structure in Cartridges: Keys as Shareable Routers in Self-Studied Representations

arXiv:2508.17032v2

Originality Incremental advance

AI Analysis

This addresses the bottleneck of linearly growing KV cache memory in LLM inference, offering insights for optimization, though it is incremental as it builds on existing Cartridge work.

The paper investigates the learned structure of Cartridges, a method for compressing KV caches in long-context LLMs, showing that keys act as shareable routers and compression occurs in values, with ablation of keys between tasks causing little performance loss.

A bottleneck for long-context LLM inference is the linearly growing KV cache. Recent work has proposed Cartridges, an approach which leverages offline compute to train a much smaller KV cache than is typically required for a full document (up to 40x less memory usage at inference time). In this paper, we present the first mechanistic exploration of the learned Cartridge key-value cache structure. In particular, we propose that (1) Cartridge keys act as stable, shareable retrieval routers for the compressed corpora and (2) most of the learned compression occurs within the Cartridge value vectors. We present empirical evidence of our routing theory across tasks, model families, and model sizes; for example, we can ablate the learned Cartridge key vectors between tasks with little performance loss. Finally, we propose a slight improvement in initialization called Sampled Chunk Initialization (SCI). We suggest that SCI can lead to faster Cartridge convergence than previously demonstrated in the literature. Our findings lay the groundwork for broader empirical study of Cartridge training optimization which may be crucial for further scaling.

View on arXiv PDF

Similar