Training-Free Multi-Concept Image Editing
This addresses the problem of preserving identity and capturing non-textual visual concepts in image editing for users of diffusion models, though it is incremental as it builds on existing optimization and adapter techniques.
The paper tackles the challenge of editing images with diffusion models without training by introducing a training-free framework that combines Optimised DDS with LoRA-driven concept composition, achieving consistent improvements over existing methods on benchmarks like InstructPix2Pix and ComposLoRA.
Editing images with diffusion models without training remains challenging. While recent optimisation-based methods achieve strong zero-shot edits from text, they struggle to preserve identity or capture details that language alone cannot express. Many visual concepts such as facial structure, material texture, or object geometry are impossible to express purely through text prompts alone. To address this gap, we introduce a training-free framework for concept-based image editing, which unifies Optimised DDS with LoRA-driven concept composition, where the training data of the LoRA represent the concept. Our approach enables combining and controlling multiple visual concepts directly within the diffusion process, integrating semantic guidance from text with low-level cues from pretrained concept adapters. We further refine DDS for stability and controllability through ordered timesteps, regularisation, and negative-prompt guidance. Quantitative and qualitative results demonstrate consistent improvements over existing training-free diffusion editing methods on InstructPix2Pix and ComposLoRA benchmarks. Code will be made publicly available.