CVJan 26, 2025

Classifying Deepfakes Using Swin Transformers

arXiv:2501.15656v23.67 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

It addresses the challenge of digital media authenticity for security and verification purposes, but is incremental as it adapts an existing transformer method to a specific domain.

This study tackled the problem of detecting deepfake images by applying Swin Transformers, achieving a test accuracy of 71.29% and outperforming conventional CNN-based architectures like VGG16 and ResNet18.

The proliferation of deepfake technology poses significant challenges to the authenticity and trustworthiness of digital media, necessitating the development of robust detection methods. This study explores the application of Swin Transformers, a state-of-the-art architecture leveraging shifted windows for self-attention, in detecting and classifying deepfake images. Using the Real and Fake Face Detection dataset by Yonsei University's Computational Intelligence Photography Lab, we evaluate the Swin Transformer and hybrid models such as Swin-ResNet and Swin-KNN, focusing on their ability to identify subtle manipulation artifacts. Our results demonstrate that the Swin Transformer outperforms conventional CNN-based architectures, including VGG16, ResNet18, and AlexNet, achieving a test accuracy of 71.29%. Additionally, we present insights into hybrid model design, highlighting the complementary strengths of transformer and CNN-based approaches in deepfake detection. This study underscores the potential of transformer-based architectures for improving accuracy and generalizability in image-based manipulation detection, paving the way for more effective countermeasures against deepfake threats.

View on arXiv PDF

Similar