LGSep 20, 2025

DISCO: Disentangled Communication Steering for Large Language Models

arXiv:2509.16820v13 citationsh-index: 8
Originality Highly original
AI Analysis

This work addresses the challenge of fine-grained control over LLM outputs for researchers and practitioners, offering a novel approach to steering vector methods.

The paper tackled the problem of guiding large language model outputs by proposing DISCO, a method that injects steering vectors directly into query and value representation spaces within attention heads, achieving up to 19.1% higher steering efficacy than baselines on models like LLaMA 3.1 8B and Gemma 2 9B.

A variety of recent methods guide large language model outputs via the inference-time addition of steering vectors to residual-stream or attention-head representations. In contrast, we propose to inject steering vectors directly into the query and value representation spaces within attention heads. We provide evidence that a greater portion of these spaces exhibit high linear discriminability of concepts --a key property motivating the use of steering vectors-- than attention head outputs. We analytically characterize the effect of our method, which we term DISentangled COmmunication (DISCO) Steering, on attention head outputs. Our analysis reveals that DISCO disentangles a strong but underutilized baseline, steering attention inputs, which implicitly modifies queries and values in a rigid manner. In contrast, DISCO's direct modulation of these components enables more granular control. We find that DISCO achieves superior performance over a number of steering vector baselines across multiple datasets on LLaMA 3.1 8B and Gemma 2 9B, with steering efficacy scoring up to 19.1% higher than the runner-up. Our results support the conclusion that the query and value spaces are powerful building blocks for steering vector methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes