CVOct 26, 2025

Self-Attention Decomposition For Training Free Diffusion Editing

Tharun Anand, Mohammad Hassan Vali, Arno Solin

arXiv:2510.22650v1h-index: 5

AI Analysis

This work addresses the challenge of efficient and interpretable editing in diffusion models for image synthesis applications, representing an incremental improvement over existing methods.

The paper tackles the problem of precise control over diffusion model outputs for targeted image editing by proposing an analytical method that derives semantic editing directions directly from pretrained self-attention weight matrices, resulting in a 60% reduction in editing time compared to benchmarks.

Diffusion models achieve remarkable fidelity in image synthesis, yet precise control over their outputs for targeted editing remains challenging. A key step toward controllability is to identify interpretable directions in the model's latent representations that correspond to semantic attributes. Existing approaches for finding interpretable directions typically rely on sampling large sets of images or training auxiliary networks, which limits efficiency. We propose an analytical method that derives semantic editing directions directly from the pretrained parameters of diffusion models, requiring neither additional data nor fine-tuning. Our insight is that self-attention weight matrices encode rich structural information about the data distribution learned during training. By computing the eigenvectors of these weight matrices, we obtain robust and interpretable editing directions. Experiments demonstrate that our method produces high-quality edits across multiple datasets while reducing editing time significantly by 60% over current benchmarks.

View on arXiv PDF

Similar