SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing
This work addresses the need for increased user freedom in image editing for applications like object replacement, though it appears incremental as it builds on existing diffusion models.
The authors tackled the problem of limited user control in text-conditional image editing by introducing a new task called Specific Reference Condition Real Image Editing, which allows users to provide a reference image for more precise outcomes, and they proposed a fast baseline method named SpecRef that achieved satisfactory performance.
Text-conditional image editing based on large diffusion generative model has attracted the attention of both the industry and the research community. Most existing methods are non-reference editing, with the user only able to provide a source image and text prompt. However, it restricts user's control over the characteristics of editing outcome. To increase user freedom, we propose a new task called Specific Reference Condition Real Image Editing, which allows user to provide a reference image to further control the outcome, such as replacing an object with a particular one. To accomplish this, we propose a fast baseline method named SpecRef. Specifically, we design a Specific Reference Attention Controller to incorporate features from the reference image, and adopt a mask mechanism to prevent interference between editing and non-editing regions. We evaluate SpecRef on typical editing tasks and show that it can achieve satisfactory performance. The source code is available on https://github.com/jingjiqinggong/specp2p.