CVAILGMay 23, 2024

EditWorld: Simulating World Dynamics for Instruction-Following Image Editing

arXiv:2405.14785v137 citationsh-index: 11Has CodeMM
Originality Incremental advance
AI Analysis

This work addresses the problem of making image editing more realistic and dynamic for users, though it appears incremental as it builds on existing diffusion models and instruction-based approaches.

The paper tackles the limitation of existing instruction-based image editing methods, which focus on simple operations and lack understanding of world dynamics, by introducing EditWorld, a new task and dataset for world-instructed image editing that simulates realistic physical scenarios, and demonstrates significant outperformance over existing methods in experiments.

Diffusion models have significantly improved the performance of image editing. Existing methods realize various approaches to achieve high-quality image editing, including but not limited to text control, dragging operation, and mask-and-inpainting. Among these, instruction-based editing stands out for its convenience and effectiveness in following human instructions across diverse scenarios. However, it still focuses on simple editing operations like adding, replacing, or deleting, and falls short of understanding aspects of world dynamics that convey the realistic dynamic nature in the physical world. Therefore, this work, EditWorld, introduces a new editing task, namely world-instructed image editing, which defines and categorizes the instructions grounded by various world scenarios. We curate a new image editing dataset with world instructions using a set of large pretrained models (e.g., GPT-3.5, Video-LLava and SDXL). To enable sufficient simulation of world dynamics for image editing, our EditWorld trains model in the curated dataset, and improves instruction-following ability with designed post-edit strategy. Extensive experiments demonstrate our method significantly outperforms existing editing methods in this new task. Our dataset and code will be available at https://github.com/YangLing0818/EditWorld

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes