CVDec 25, 2025

Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints

Mutiara Shabrina, Nova Kurnia Putri, Jefri Satria Ferdiansyah, Sabita Khansa Dewi, Novanto Yudistira

arXiv:2512.21637v1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses disentangled image editing for users needing precise attribute modifications, but it is incremental as it builds on the existing PPE framework.

The paper tackles the problem of attribute entanglement in text-guided image editing, where modifying a target attribute unintentionally alters other semantic properties, by introducing a sparsity-based constraint using L1 regularization on latent space manipulation. The result is more focused and controlled edits, effectively reducing unintended changes in non-target attributes while preserving facial identity, as demonstrated on the CelebA-HQ dataset.

Text-driven image manipulation often suffers from attribute entanglement, where modifying a target attribute (e.g., adding bangs) unintentionally alters other semantic properties such as identity or appearance. The Predict, Prevent, and Evaluate (PPE) framework addresses this issue by leveraging pre-trained vision-language models for disentangled editing. In this work, we analyze the PPE framework, focusing on its architectural components, including BERT-based attribute prediction and StyleGAN2-based image generation on the CelebA-HQ dataset. Through empirical analysis, we identify a limitation in the original regularization strategy, where latent updates remain dense and prone to semantic leakage. To mitigate this issue, we introduce a sparsity-based constraint using L1 regularization on latent space manipulation. Experimental results demonstrate that the proposed approach enforces more focused and controlled edits, effectively reducing unintended changes in non-target attributes while preserving facial identity.

View on arXiv PDF

Similar