SDAICRASMay 26, 2025

STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution

arXiv:2505.19644v35 citationsh-index: 10INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited progress in deepfake speech source tracing for forensic and detection applications, though it is incremental as it focuses on dataset creation rather than a new method.

The authors tackled the lack of a dedicated dataset for deepfake speech source tracing by introducing STOPA, a systematically varied dataset with 700k samples from 13 synthesisers, which improves attribution accuracy for forensic analysis and detection.

A key research area in deepfake speech detection is source tracing - determining the origin of synthesised utterances. The approaches may involve identifying the acoustic model (AM), vocoder model (VM), or other generation-specific parameters. However, progress is limited by the lack of a dedicated, systematically curated dataset. To address this, we introduce STOPA, a systematically varied and metadata-rich dataset for deepfake speech source tracing, covering 8 AMs, 6 VMs, and diverse parameter settings across 700k samples from 13 distinct synthesisers. Unlike existing datasets, which often feature limited variation or sparse metadata, STOPA provides a systematically controlled framework covering a broader range of generative factors, such as the choice of the vocoder model, acoustic model, or pretrained weights, ensuring higher attribution reliability. This control improves attribution accuracy, aiding forensic analysis, deepfake detection, and generative model transparency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes