CVAISep 30, 2025

Towards Continual Expansion of Data Coverage: Automatic Text-guided Edge-case Synthesis

arXiv:2509.26158v1h-index: 2Has Code
Originality Highly original
AI Analysis

This work addresses the bottleneck of manual data curation for AI systems, offering a scalable framework for automated, targeted synthesis to improve reliability.

The paper tackles the problem of dataset bias in deep neural networks by proposing an automated pipeline for text-guided edge-case synthesis, which achieves superior robustness on the FishEye8K object detection benchmark compared to naive augmentation and manually engineered prompts.

The performance of deep neural networks is strongly influenced by the quality of their training data. However, mitigating dataset bias by manually curating challenging edge cases remains a major bottleneck. To address this, we propose an automated pipeline for text-guided edge-case synthesis. Our approach employs a Large Language Model, fine-tuned via preference learning, to rephrase image captions into diverse textual prompts that steer a Text-to-Image model toward generating difficult visual scenarios. Evaluated on the FishEye8K object detection benchmark, our method achieves superior robustness, surpassing both naive augmentation and manually engineered prompts. This work establishes a scalable framework that shifts data curation from manual effort to automated, targeted synthesis, offering a promising direction for developing more reliable and continuously improving AI systems. Code is available at https://github.com/gokyeongryeol/ATES.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes