Improving Interpretability and Robustness for the Detection of AI-Generated Images
This work addresses the robustness of AI-generated image detection, which is crucial for security and media integrity, but it is incremental as it builds on existing methods.
The paper tackled the problem of poor generalization in AI-generated image detectors by analyzing existing methods and proposing two improvements, which increased the mean out-of-distribution classification score by up to 6% for cross-model transfer.
With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.