DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild
This work addresses the challenge of secure speaker verification against spoofing attacks, which is crucial for authentication systems, but it appears incremental as it builds on existing methods.
The paper tackles the problem of spoofing-aware automatic speaker verification in the wild by proposing a robust framework that combines a spoofing detector and a speaker verification network, achieving competitive results in the WildSpoof Challenge.
This paper presents the DFKI-Speech system developed for the WildSpoof Challenge under the Spoofing aware Automatic Speaker Verification (SASV) track. We propose a robust SASV framework in which a spoofing detector and a speaker verification (SV) network operate in tandem. The spoofing detector employs a self-supervised speech embedding extractor as the frontend, combined with a state-of-the-art graph neural network backend. In addition, a top-3 layer based mixture-of-experts (MoE) is used to fuse high-level and low-level features for effective spoofed utterance detection. For speaker verification, we adapt a low-complexity convolutional neural network that fuses 2D and 1D features at multiple scales, trained with the SphereFace loss. Additionally, contrastive circle loss is applied to adaptively weight positive and negative pairs within each training batch, enabling the network to better distinguish between hard and easy sample pairs. Finally, fixed imposter cohort based AS Norm score normalization and model ensembling are used to further enhance the discriminative capability of the speaker verification system.