FAME: A Lightweight Spatio-Temporal Network for Model Attribution of Face-Swap Deepfakes
This addresses the need for forensic tools to identify the source of deepfake manipulations, which is crucial for digital security and media integrity, representing a novel method for a known bottleneck.
The paper tackled the problem of model attribution for face-swap deepfakes by introducing FAME, a lightweight spatio-temporal network, which outperformed existing methods in accuracy and runtime on three datasets.
The widespread emergence of face-swap Deepfake videos poses growing risks to digital security, privacy, and media integrity, necessitating effective forensic tools for identifying the source of such manipulations. Although most prior research has focused primarily on binary Deepfake detection, the task of model attribution -- determining which generative model produced a given Deepfake -- remains underexplored. In this paper, we introduce FAME (Fake Attribution via Multilevel Embeddings), a lightweight and efficient spatio-temporal framework designed to capture subtle generative artifacts specific to different face-swap models. FAME integrates spatial and temporal attention mechanisms to improve attribution accuracy while remaining computationally efficient. We evaluate our model on three challenging and diverse datasets: Deepfake Detection and Manipulation (DFDM), FaceForensics++, and FakeAVCeleb. Results show that FAME consistently outperforms existing methods in both accuracy and runtime, highlighting its potential for deployment in real-world forensic and information security applications.