LGCLMay 6

Conceptors for Semantic Steering

arXiv:2605.0498076.51 citations
Predicted impact top 18% in LG · last 90 daysOriginality Highly original
AI Analysis

For practitioners of LLM steering, conceptors offer a geometrically principled, compositional, and safer alternative to single-direction steering from limited contrastive pairs.

The paper introduces conceptors, soft projection matrices estimated from activations across both poles of a bipolar concept, for steering LLM behavior. Conceptors preserve the concept's full multidimensional subspace, match or outperform additive baselines with fewer degenerate outputs, and provide a parameter-free layer-selection diagnostic with Pearson correlations up to r=0.96.

Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes