LJ-Spoof: A Generatively Varied Corpus for Audio Anti-Spoofing and Synthesis Source Tracing
This addresses the problem of limited training data for researchers and practitioners in audio security, though it is incremental as it focuses on dataset creation rather than new methods.
The authors tackled the lack of diverse datasets for audio anti-spoofing and synthesis-source tracing by introducing LJ-Spoof, a speaker-specific corpus with systematic variations in model architectures, synthesis pipelines, and generative parameters, resulting in over 3 million utterances across 30 TTS families and 500 subsets.
Speaker-specific anti-spoofing and synthesis-source tracing are central challenges in audio anti-spoofing. Progress has been hampered by the lack of datasets that systematically vary model architectures, synthesis pipelines, and generative parameters. To address this gap, we introduce LJ-Spoof, a speaker-specific, generatively diverse corpus that systematically varies prosody, vocoders, generative hyperparameters, bona fide prompt sources, training regimes, and neural post-processing. The corpus spans one speakers-including studio-quality recordings-30 TTS families, 500 generatively variant subsets, 10 bona fide neural-processing variants, and more than 3 million utterances. This variation-dense design enables robust speaker-conditioned anti-spoofing and fine-grained synthesis-source tracing. We further position this dataset as both a practical reference training resource and a benchmark evaluation suite for anti-spoofing and source tracing.