Integrated Spoofing-Robust Automatic Speaker Verification via a Three-Class Formulation and LLR
This work addresses spoofing attacks in speaker verification systems, offering improved interpretability and adaptability, though it is incremental as it builds on existing integration methods.
The paper tackled the problem of spoofing-robust automatic speaker verification by proposing a unified end-to-end framework with a three-class formulation for log-likelihood ratio inference, achieving comparable performance on ASVSpoof5 and better results on SpoofCeleb.
Spoofing-robust automatic speaker verification (SASV) aims to integrate automatic speaker verification (ASV) and countermeasure (CM). A popular solution is fusion of independent ASV and CM scores. To better modeling SASV, some frameworks integrate ASV and CM within a single network. However, these solutions are typically bi-encoder based, offer limited interpretability, and cannot be readily adapted to new evaluation parameters without retraining. Based on this, we propose a unified end-to-end framework via a three-class formulation that enables log-likelihood ratio (LLR) inference from class logits for a more interpretable decision pipeline. Experiments show comparable performance to existing methods on ASVSpoof5 and better results on SpoofCeleb. The visualization and analysis also prove that the three-class reformulation provides more interpretability.