SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity
This addresses the challenge of maintaining structural integrity in design editing for applications like posters and websites, though it appears incremental by building on prior local edit models.
The paper tackles the problem of compositional layout and content editing across structured and unstructured domains by introducing SMART-Editor, a framework that preserves global coherence through Reward-Refine and RewardDPO strategies, resulting in up to 15% gains in structured settings and advantages on natural images.
We present SMART-Editor, a framework for compositional layout and content editing across structured (posters, websites) and unstructured (natural images) domains. Unlike prior models that perform local edits, SMART-Editor preserves global coherence through two strategies: Reward-Refine, an inference-time rewardguided refinement method, and RewardDPO, a training-time preference optimization approach using reward-aligned layout pairs. To evaluate model performance, we introduce SMARTEdit-Bench, a benchmark covering multi-domain, cascading edit scenarios. SMART-Editor outperforms strong baselines like InstructPix2Pix and HIVE, with RewardDPO achieving up to 15% gains in structured settings and Reward-Refine showing advantages on natural images. Automatic and human evaluations confirm the value of reward-guided planning in producing semantically consistent and visually aligned edits.