Visual Instruction Inversion: Image Editing via Visual Prompting
This addresses the challenge of precise image editing for users when text descriptions are insufficient, offering an incremental improvement over existing methods.
The paper tackles the problem of ambiguous language in text-conditioned image editing by proposing a method that uses visual prompts (before/after image pairs) to learn text-based editing directions, achieving competitive results with just one example pair compared to state-of-the-art frameworks.
Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.