ControlFill: Spatially Adjustable Image Inpainting from Prompt Learning
This work addresses the need for fine-grained control in image inpainting tasks, offering a novel method for users to manipulate images, though it appears incremental in the context of existing diffusion-based inpainting techniques.
The authors tackled the problem of controllable image inpainting by proposing ControlFill, a framework that trains separate prompts for object creation and background removal, enabling users to adjust intensity spatially and achieving results without heavy text encoders.
In this report, I present an inpainting framework named \textit{ControlFill}, which involves training two distinct prompts: one for generating plausible objects within a designated mask (\textit{creation}) and another for filling the region by extending the background (\textit{removal}). During the inference stage, these learned embeddings guide a diffusion network that operates without requiring heavy text encoders. By adjusting the relative significance of the two prompts and employing classifier-free guidance, users can control the intensity of removal or creation. Furthermore, I introduce a method to spatially vary the intensity of guidance by assigning different scales to individual pixels.