On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models
This highlights a critical security weakness in deepfake detection systems, which is incremental but important for cybersecurity and media integrity applications.
The paper investigates the vulnerability of deepfake detectors to black-box attacks generated by Denoising Diffusion Models (DDMs), finding that a single denoising diffusion step can significantly reduce detection likelihood without perceptible image modifications, with detectors trained on such attacks showing limited generalizability.
The detection of malicious deepfakes is a constantly evolving problem that requires continuous monitoring of detectors to ensure they can detect image manipulations generated by the latest emerging models. In this paper, we investigate the vulnerability of single-image deepfake detectors to black-box attacks created by the newest generation of generative methods, namely Denoising Diffusion Models (DDMs). Our experiments are run on FaceForensics++, a widely used deepfake benchmark consisting of manipulated images generated with various techniques for face identity swapping and face reenactment. Attacks are crafted through guided reconstruction of existing deepfakes with a proposed DDM approach for face restoration. Our findings indicate that employing just a single denoising diffusion step in the reconstruction process of a deepfake can significantly reduce the likelihood of detection, all without introducing any perceptible image modifications. While training detectors using attack examples demonstrated some effectiveness, it was observed that discriminators trained on fully diffusion-based deepfakes exhibited limited generalizability when presented with our attacks.