CVApr 10

SHIFT: Steering Hidden Intermediates in Flow Transformers

arXiv:2604.0921348.9h-index: 10
AI Analysis

This work addresses the need for flexible, real-time control over image generation in diffusion models, offering a practical tool for users, though it is incremental as it builds on existing activation steering techniques.

The authors tackled the problem of controlling concept removal and style manipulation in DiT-based diffusion models for image generation, proposing SHIFT, a lightweight framework that uses learned steering vectors to suppress unwanted concepts or shift styles at inference time, achieving effective control without retraining.

Diffusion models have become leading approaches for high-fidelity image generation. Recent DiT-based diffusion models, in particular, achieve strong prompt adherence while producing high-quality samples. We propose SHIFT, a simple but effective and lightweight framework for concept removal in DiT diffusion models via targeted manipulation of intermediate activations at inference time, inspired by activation steering in large language models. SHIFT learns steering vectors that are dynamically applied to selected layers and timesteps to suppress unwanted visual concepts while preserving the prompt's remaining content and overall image quality. Beyond suppression, the same mechanism can shift generations into a desired \emph{style domain} or bias samples toward adding or changing target objects. We demonstrate that SHIFT provides effective and flexible control over DiT generation across diverse prompts and targets without time-consuming retraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes