CLAIJul 11, 2025

KV Cache Steering for Controlling Frozen LLMs

arXiv:2507.08799v24 citationsh-index: 67
Originality Incremental advance
AI Analysis

This method provides a practical tool for behavior-level guidance of language models, enabling controllable transfer of reasoning styles with advantages in inference latency and integration ease, though it is incremental as it builds on prior activation steering techniques.

The authors tackled the problem of controlling frozen large language models (LLMs) without fine-tuning or prompt modifications by proposing cache steering, a lightweight one-shot intervention method applied to the key-value cache, which improved both qualitative reasoning structure and quantitative task performance on benchmarks like GPQA and MATH.

We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach constructs steering vectors from reasoning traces, obtained either from teacher models (e.g., GPT-4o) or existing human annotations, that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Additional experiments show that the method also scales to larger models and yields further gains on challenging datasets such as GPQA and MATH. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of inference latency, hyperparameter stability, and ease of integration with existing inference APIs. Beyond mere reasoning induction, we show that cache steering enables controllable transfer of reasoning styles (e.g., stepwise, causal, analogical), making it a practical tool for behavior-level guidance of language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes