CLAILGJun 25, 2024

Multi-property Steering of Large Language Models with Dynamic Activation Composition

arXiv:2406.17563v136 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of robustly controlling multiple properties in language model generation, which is incremental but important for practical applications.

The paper tackles the problem of multi-property steering in large language models, where existing activation steering methods are limited to single properties and synthetic settings. The proposed Dynamic Activation Composition method successfully maintains high conditioning while minimizing impact on fluency, with experiments showing improved performance over baselines.

Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes