KoDF: A Large-scale Korean DeepFake Detection Dataset
This provides a domain-specific resource for detecting deepfakes in Korean content, but it is incremental as it builds on existing dataset efforts.
The authors tackled the problem of deepfake detection by constructing KoDF, a large-scale dataset of synthesized and real videos focused on Korean subjects, and showed discrepancies in distributions compared to existing datasets, emphasizing the need for multiple datasets for real-world generalization.
A variety of effective face-swap and face-reenactment methods have been publicized in recent years, democratizing the face synthesis technology to a great extent. Videos generated as such have come to be called deepfakes with a negative connotation, for various social problems they have caused. Facing the emerging threat of deepfakes, we have built the Korean DeepFake Detection Dataset (KoDF), a large-scale collection of synthesized and real videos focused on Korean subjects. In this paper, we provide a detailed description of methods used to construct the dataset, experimentally show the discrepancy between the distributions of KoDF and existing deepfake detection datasets, and underline the importance of using multiple datasets for real-world generalization. KoDF is publicly available at https://moneybrain-research.github.io/kodf in its entirety (i.e. real clips, synthesized clips, clips with adversarial attack, and metadata).