Few-Shot Class-Incremental Model Attribution Using Learnable Representation From CLIP-ViT Features
This addresses the need for efficient model attribution in forensic analysis of synthetic images as generative AI evolves, though it appears incremental by adapting existing FSCIL mechanisms to a new domain.
The paper tackles the problem of model attribution for continuously emerging generative AI models by proposing a few-shot class-incremental learning approach that uses learnable representations from CLIP-ViT features, achieving effective extension from prior to recent generative models in experiments.
Recently, images that distort or fabricate facts using generative models have become a social concern. To cope with continuous evolution of generative artificial intelligence (AI) models, model attribution (MA) is necessary beyond just detection of synthetic images. However, current deep learning-based MA methods must be trained from scratch with new data to recognize unseen models, which is time-consuming and data-intensive. This work proposes a new strategy to deal with persistently emerging generative models. We adapt few-shot class-incremental learning (FSCIL) mechanisms for MA problem to uncover novel generative AI models. Unlike existing FSCIL approaches that focus on object classification using high-level information, MA requires analyzing low-level details like color and texture in synthetic images. Thus, we utilize a learnable representation from different levels of CLIP-ViT features. To learn an effective representation, we propose Adaptive Integration Module (AIM) to calculate a weighted sum of CLIP-ViT block features for each image, enhancing the ability to identify generative models. Extensive experiments show our method effectively extends from prior generative models to recent ones.