AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit
This work addresses the need for standardized benchmarking in audio deepfake detection for researchers and practitioners, though it is incremental as it builds on existing datasets and methods.
The authors tackled the problem of evaluating audio deepfake detectors by systematically reviewing 28 datasets and developing an open-source benchmarking toolkit called AUDDT to automate evaluation across these datasets, revealing notable differences in detection performance across conditions and manipulation types.
With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, most models are evaluated on a narrow set of datasets, leaving their generalization to real-world conditions uncertain. In this paper, we systematically review 28 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT (https://github.com/MuSAELab/AUDDT). The goal of this toolkit is to automate the evaluation of pretrained detectors across these 28 datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors. We start by showcasing the usage of the developed toolkit, the composition of our benchmark, and the breakdown of different deepfake subgroups. Next, using a widely adopted pretrained deepfake detector, we present in- and out-of-domain detection results, revealing notable differences across conditions and audio manipulation types. Lastly, we also analyze the limitations of these existing datasets and their gap relative to practical deployment scenarios.