CVJul 26, 2023

Visual Instruction Inversion: Image Editing via Visual Prompting

arXiv:2307.14331v133 citationsh-index: 46
Originality Incremental advance
AI Analysis

This addresses the challenge of precise image editing for users when text descriptions are insufficient, offering an incremental improvement over existing methods.

The paper tackles the problem of ambiguous language in text-conditioned image editing by proposing a method that uses visual prompts (before/after image pairs) to learn text-based editing directions, achieving competitive results with just one example pair compared to state-of-the-art frameworks.

Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more informative and intuitive way to convey ideas. We present a method for image editing via visual prompting. Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images. We leverage the rich, pretrained editing capabilities of text-to-image diffusion models by inverting visual prompts into editing instructions. Our results show that with just one example pair, we can achieve competitive results compared to state-of-the-art text-conditioned image editing frameworks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes