CVAILGNov 16, 2023

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

arXiv:2311.10089v1317 citationsh-index: 29
Originality Highly original
AI Analysis

This work addresses the challenge of precise image editing for users needing reliable natural language-based tools, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of inaccurate execution of natural language instructions in image editing by presenting Emu Edit, a multi-task model that sets state-of-the-art results in instruction-based image editing, achieving high performance across tasks like region-based editing and computer vision tasks.

Instruction-based image editing holds immense potential for a variety of applications, as it enables users to perform any editing operation using a natural language instruction. However, current models in this domain often struggle with accurately executing user instructions. We present Emu Edit, a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. To develop Emu Edit we train it to multi-task across an unprecedented range of tasks, such as region-based editing, free-form editing, and Computer Vision tasks, all of which are formulated as generative tasks. Additionally, to enhance Emu Edit's multi-task learning abilities, we provide it with learned task embeddings which guide the generation process towards the correct edit type. Both these elements are essential for Emu Edit's outstanding performance. Furthermore, we show that Emu Edit can generalize to new tasks, such as image inpainting, super-resolution, and compositions of editing tasks, with just a few labeled examples. This capability offers a significant advantage in scenarios where high-quality samples are scarce. Lastly, to facilitate a more rigorous and informed assessment of instructable image editing models, we release a new challenging and versatile benchmark that includes seven different image editing tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes