Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models
This addresses the problem of precise image manipulation for users in real-world applications, but it is incremental as it builds on existing diffusion models.
The paper tackles fine-grained real-image editing by proposing a diffusion-based framework with pixel-wise guidance, which outperforms GAN-based methods in editing quality and speed.
Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a target image. Users then manipulate the map to instruct how the image will be edited. We utilize a pre-trained diffusion model to generate edited images aligned with the user's intention with pixel-wise guidance. The effective combination of proposed guidance and other techniques enables highly controllable editing with preserving the outside of the edited area, which results in meeting our requirements. The experimental results demonstrate that our proposal outperforms the GAN-based method for editing quality and speed.