CVApr 24, 2025

Step1X-Edit: A Practical Framework for General Image Editing

Tsinghua
arXiv:2504.17761v5339 citationsh-index: 19Has Code
Originality Incremental advance
AI Analysis

This provides a practical open-source alternative for general image editing, though it appears incremental as it builds on existing multimodal and diffusion methods.

The paper tackles the performance gap between open-source and closed-source image editing models by introducing Step1X-Edit, which uses a Multimodal LLM and diffusion decoder to achieve results comparable to proprietary models like GPT-4o and Gemini2 Flash on the new GEdit-Bench benchmark.

In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. These models demonstrate an impressive aptitude for fulfilling a vast majority of user-driven editing requirements, marking a significant advancement in the field of image manipulation. However, there is still a large gap between the open-source algorithm with these closed-source models. Thus, in this paper, we aim to release a state-of-the-art image editing model, called Step1X-Edit, which can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash. More specifically, we adopt the Multimodal LLM to process the reference image and the user's editing instruction. A latent embedding has been extracted and integrated with a diffusion image decoder to obtain the target image. To train the model, we build a data generation pipeline to produce a high-quality dataset. For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions. Experimental results on GEdit-Bench demonstrate that Step1X-Edit outperforms existing open-source baselines by a substantial margin and approaches the performance of leading proprietary models, thereby making significant contributions to the field of image editing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes