LG AIApr 17

MidSteer: Optimal Affine Framework for Steering Generative Models

Tatiana Gaintseva, Andrew Stepanov, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi

arXiv:2605.0522075.9h-index: 8

AI Analysis

This work provides a theoretical foundation for steering intermediate representations, addressing a gap in post-deployment alignment and safety for generative models.

The paper formalizes the theory of concept steering in generative models, linking it to affine concept erasure and introducing MidSteer, a framework for minimal-disturbance concept manipulation. MidSteer achieves favorable performance across vision diffusion models and large language models.

Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post-deployment alignment and safety settings. However, despite its empirical success, it currently lacks a comprehensive theoretical framework. In this paper, we bridge this gap by formalizing the theory of concept steering. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE (a closed-form method for affine erasure). Next, we formulate a principled theoretical framework for concept switching, LEACE-Switch, and characterize the assumptions under which it provides an optimal affine solution. Building on this analysis, we then introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal-disturbance transformations. We demonstrate that MidSteer performs favorably across a range of tasks, modalities, and architectures, including vision diffusion models and large language models.

View on arXiv PDF

Similar