SDAIASSep 18, 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

arXiv:2409.17285v243 citationsh-index: 28
AI Analysis

This addresses the problem of limited and unrealistic datasets for researchers and developers working on robust speech security systems, though it is incremental as it builds on existing datasets like VoxCeleb1.

The paper tackles the lack of real-world data for speech deepfake detection and spoofing-robust speaker verification by introducing SpoofCeleb, a dataset with over 2.5 million utterances from 1,251 speakers under varied acoustic conditions, and provides baseline results for these tasks.

This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, current datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Current SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. SpoofCeleb leverages a fully automated pipeline we developed that processes the VoxCeleb1 dataset, transforming it into a suitable form for TTS training. We subsequently train 23 contemporary TTS systems. SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions. The dataset includes carefully partitioned training, validation, and evaluation sets with well-controlled experimental protocols. We present the baseline results for both SDD and SASV tasks. All data, protocols, and baselines are publicly available at https://jungjee.github.io/spoofceleb.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes