LGAIApr 17

MidSteer: Optimal Affine Framework for Steering Generative Models

arXiv:2605.0522075.9h-index: 8
AI Analysis

This work provides a theoretical foundation for steering intermediate representations, addressing a gap in post-deployment alignment and safety for generative models.

The paper formalizes the theory of concept steering in generative models, linking it to affine concept erasure and introducing MidSteer, a framework for minimal-disturbance concept manipulation. MidSteer achieves favorable performance across vision diffusion models and large language models.

Steering intermediate representations has emerged as a powerful strategy for controlling generative models, particularly in post-deployment alignment and safety settings. However, despite its empirical success, it currently lacks a comprehensive theoretical framework. In this paper, we bridge this gap by formalizing the theory of concept steering. First, we establish a link between steering and affine concept erasure, proving that the standard approach for removing unwanted behaviors is a special case of LEACE (a closed-form method for affine erasure). Next, we formulate a principled theoretical framework for concept switching, LEACE-Switch, and characterize the assumptions under which it provides an optimal affine solution. Building on this analysis, we then introduce MidSteer (Minimal Disturbance concept Steering), a more general affine framework for concept manipulation that relaxes these assumptions and enables directed, minimal-disturbance transformations. We demonstrate that MidSteer performs favorably across a range of tasks, modalities, and architectures, including vision diffusion models and large language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes