AIMay 3

DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents

Qisong Zhang, Wenzhuo Wu, Zhuangzhuang Jia, Yunhao Yang, Huayu Zhang, Xianghao Zang, Zhixiang He, Zhongjiang He, Kongming Liang, Zhanyu Ma

arXiv:2605.0178989.0

Predicted impact top 21% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For researchers building visual datasets for image editing and multimodal understanding, DataEvolver provides a reusable framework to automate and improve data quality through iterative self-correction and validation, though the current validation is limited to a single rotation task.

DataEvolver introduces a closed-loop visual data engine that iteratively generates, inspects, corrects, filters, and exports visual data (e.g., images, masks, depth maps) via goal-driven agents. In an object-rotation task, it achieves a 5.2% improvement over the base model on SpatialEdit and a 4.8% improvement on a held-out set, with consistent gains from each loop component.

Constructing controllable visual data is a major bottleneck for image editing and multimodal understanding. Useful supervision is rarely produced by a single rendering pass; instead it emerges through iterative generation, inspection, correction, filtering, and export. We present DataEvolver, a closed-loop visual data engine that organizes this process around explicit goals, persistent artifacts, bounded corrective actions, and acceptance decisions. DataEvolver supports multiple artifact types, including RGB images, masks, depth maps, normal maps, meshes, poses, trajectories, and review traces. In the current release, the system operates through two coupled loops: generation-time self-correction within each sample and validation-time self-expansion across dataset rounds. We validate the framework on an image-level object-rotation setting. With a fixed Qwen-Edit LoRA probe, our final Ours+DualGate model outperforms both the unadapted base model and a public multi-angle LoRA on SpatialEdit and a held-out evaluation set. Ablations show a consistent improvement path from scene-aware generation to feedback-driven correction and dual-gated validation. Beyond the released rotation data, our main contribution is a reusable framework for building visual datasets through explicit goal tracking, review, correction, and acceptance loops.

View on arXiv PDF

Similar