Automated In-the-Wild Data Collection for Continual AI Generated Image Detection
Addresses the problem of detector performance degradation under distribution shifts and new generative models for practitioners needing robust AI image detection.
The paper proposes a data-centric continual adaptation framework for AI-generated image detection, achieving +9.14% and +8% average accuracy improvements on two state-of-the-art detectors by combining in-the-wild and generator-driven data.
The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data-centric continual adaptation framework for updating detectors in evolving environments. We show that both in-the-wild data and generator-driven data are essential for adapting detectors. We introduce an automated, weakly supervised pipeline for constructing in-the-wild datasets through fact-check article retrieval. Additionally, we demonstrate that incorporating even a small amount of generator-driven data during training enables effective adaptation to newly emerging models, while combining it with in-the-wild data within a continual learning framework enables robust adaptation and mitigates catastrophic forgetting. Extensive experiments on two state-of-the-art detectors show significant improvements of +9.14% and +8% in average accuracy, respectively.