Open Set Classification of GAN-based Image Manipulations via a ViT-based Hybrid Architecture
This addresses the challenge of detecting unseen image manipulations for forensic and security applications, but it is incremental as it builds on existing ViT and hybrid architectures.
The paper tackles the problem of classifying AI-manipulated images, such as synthetic faces, in open-set scenarios where manipulation algorithms are not seen during training, by proposing a ViT-based hybrid method with a rejection option, achieving effectiveness in tasks like facial attribute editing and GAN attribution.
Classification of AI-manipulated content is receiving great attention, for distinguishing different types of manipulations. Most of the methods developed so far fail in the open-set scenario, that is when the algorithm used for the manipulation is not represented by the training set. In this paper, we focus on the classification of synthetic face generation and manipulation in open-set scenarios, and propose a method for classification with a rejection option. The proposed method combines the use of Vision Transformers (ViT) with a hybrid approach for simultaneous classification and localization. Feature map correlation is exploited by the ViT module, while a localization branch is employed as an attention mechanism to force the model to learn per-class discriminative features associated with the forgery when the manipulation is performed locally in the image. Rejection is performed by considering several strategies and analyzing the model output layers. The effectiveness of the proposed method is assessed for the task of classification of facial attribute editing and GAN attribution.