SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow
This addresses the issue of precise and iterative image editing for users of diffusion models, though it is incremental as it builds on existing models like ControlNet.
The paper tackles the problem of prompt-based image editing models struggling with detailed instructions and local edits, often degrading global image quality after a single step, by introducing SPICE, a training-free workflow that outperforms state-of-the-art baselines on a challenging dataset and is preferred by human annotators.
Prompt-based models have demonstrated impressive prompt-following capability at image editing tasks. However, the models still struggle with following detailed editing prompts or performing local edits. Specifically, global image quality often deteriorates immediately after a single editing step. To address these challenges, we introduce SPICE, a training-free workflow that accepts arbitrary resolutions and aspect ratios, accurately follows user requirements, and consistently improves image quality during more than 100 editing steps, while keeping the unedited regions intact. By synergizing the strengths of a base diffusion model and a Canny edge ControlNet model, SPICE robustly handles free-form editing instructions from the user. On a challenging realistic image-editing dataset, SPICE quantitatively outperforms state-of-the-art baselines and is consistently preferred by human annotators. We release the workflow implementation for popular diffusion model Web UIs to support further research and artistic exploration.