AIFeb 2

SIDiffAgent: Self-Improving Diffusion Agent

arXiv:2602.02051v11 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of unreliable and inconsistent image generation for users of diffusion models, though it appears incremental as it builds on existing models and methods.

The paper tackled limitations in text-to-image diffusion models, such as sensitivity to prompts and artifacts, by introducing SIDiffAgent, a training-free agentic framework that autonomously manages prompt engineering and artifact removal, achieving an average VQA score of 0.884 on GenAIBench.

Text-to-image diffusion models have revolutionized generative AI, enabling high-quality and photorealistic image synthesis. However, their practical deployment remains hindered by several limitations: sensitivity to prompt phrasing, ambiguity in semantic interpretation (e.g., ``mouse" as animal vs. a computer peripheral), artifacts such as distorted anatomy, and the need for carefully engineered input prompts. Existing methods often require additional training and offer limited controllability, restricting their adaptability in real-world applications. We introduce Self-Improving Diffusion Agent (SIDiffAgent), a training-free agentic framework that leverages the Qwen family of models (Qwen-VL, Qwen-Image, Qwen-Edit, Qwen-Embedding) to address these challenges. SIDiffAgent autonomously manages prompt engineering, detects and corrects poor generations, and performs fine-grained artifact removal, yielding more reliable and consistent outputs. It further incorporates iterative self-improvement by storing a memory of previous experiences in a database. This database of past experiences is then used to inject prompt-based guidance at each stage of the agentic pipeline. \modelour achieved an average VQA score of 0.884 on GenAIBench, significantly outperforming open-source, proprietary models and agentic methods. We will publicly release our code upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes