CVAICLLGJul 18, 2025

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

arXiv:2507.14119v230 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable training data creation for image editing models, enabling large-scale training without human labeling effort, though it is incremental in automating existing annotation processes.

The paper tackles the challenge of automatically generating high-quality training triplets (original image, instruction, edited image) for image editing assistants by developing a modular pipeline that mines such triplets without human intervention, resulting in a released dataset of 720k triplets and a fine-tuned model that surpasses public alternatives in evaluations.

Recent advances in generative modeling enable image editing assistants that follow natural language instructions without additional user input. Their supervised training requires millions of triplets (original image, instruction, edited image), yet mining pixel-accurate examples is hard. Each edit must affect only prompt-specified regions, preserve stylistic coherence, respect physical plausibility, and retain visual appeal. The lack of robust automated edit-quality metrics hinders reliable automation at scale. We present an automated, modular pipeline that mines high-fidelity triplets across domains, resolutions, instruction complexities, and styles. Built on public generative models and running without human intervention, our system uses a task-tuned Gemini validator to score instruction adherence and aesthetics directly, removing any need for segmentation or grounding models. Inversion and compositional bootstrapping enlarge the mined set by approx. 2.6x, enabling large-scale high-fidelity training data. By automating the most repetitive annotation steps, the approach allows a new scale of training without human labeling effort. To democratize research in this resource-intensive area, we release NHR-Edit, an open dataset of 720k high-quality triplets, curated at industrial scale via millions of guided generations and validator passes, and we analyze the pipeline's stage-wise survival rates, providing a framework for estimating computational effort across different model stacks. In the largest cross-dataset evaluation, it surpasses all public alternatives. We also release Bagel-NHR-Edit, a fine-tuned Bagel model with state-of-the-art metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes