Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio
It addresses the problem of identifying and categorizing spoofed segments in audio for security and forensics applications, representing an incremental step in audio spoofing detection.
This paper tackles the novel task of Spoof Diarization in partially spoofed audio, aiming to locate and cluster spoofed regions by different spoofing methods, and proposes a benchmark model that reveals the task's high complexity even in restricted scenarios.
This paper defines Spoof Diarization as a novel task in the Partial Spoof (PS) scenario. It aims to determine what spoofed when, which includes not only locating spoof regions but also clustering them according to different spoofing methods. As a pioneering study in spoof diarization, we focus on defining the task, establishing evaluation metrics, and proposing a benchmark model, namely the Countermeasure-Condition Clustering (3C) model. Utilizing this model, we first explore how to effectively train countermeasures to support spoof diarization using three labeling schemes. We then utilize spoof localization predictions to enhance the diarization performance. This first study reveals the high complexity of the task, even in restricted scenarios where only a single speaker per audio file and an oracle number of spoofing methods are considered. Our code is available at https://github.com/nii-yamagishilab/PartialSpoof.