CVMay 8, 2025

MDE-Edit: Masked Dual-Editing for Multi-Object Image Editing via Diffusion Models

Hongyang Zhu, Haipeng Liu, Bo Fu, Yang Wang

arXiv:2505.05101v28.45 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses multi-object image editing for diffusion model users, offering a training-free solution for complex scenes, though it appears incremental as an optimization approach.

The paper tackles the problem of multi-object image editing in complex scenes with overlapping objects, where existing methods struggle with inaccurate localization and attribute-object mismatch. The proposed MDE-Edit method achieves improved editing accuracy and visual quality compared to state-of-the-art methods.

Multi-object editing aims to modify multiple objects or regions in complex scenes while preserving structural coherence. This task faces significant challenges in scenarios involving overlapping or interacting objects: (1) Inaccurate localization of target objects due to attention misalignment, leading to incomplete or misplaced edits; (2) Attribute-object mismatch, where color or texture changes fail to align with intended regions due to cross-attention leakage, creating semantic conflicts (\textit{e.g.}, color bleeding into non-target areas). Existing methods struggle with these challenges: approaches relying on global cross-attention mechanisms suffer from attention dilution and spatial interference between objects, while mask-based methods fail to bind attributes to geometrically accurate regions due to feature entanglement in multi-object scenarios. To address these limitations, we propose a training-free, inference-stage optimization approach that enables precise localized image manipulation in complex multi-object scenes, named MDE-Edit. MDE-Edit optimizes the noise latent feature in diffusion models via two key losses: Object Alignment Loss (OAL) aligns multi-layer cross-attention with segmentation masks for precise object positioning, and Color Consistency Loss (CCL) amplifies target attribute attention within masks while suppressing leakage to adjacent regions. This dual-loss design ensures localized and coherent multi-object edits. Extensive experiments demonstrate that MDE-Edit outperforms state-of-the-art methods in editing accuracy and visual quality, offering a robust solution for complex multi-object image manipulation tasks.

View on arXiv PDF

Similar