SZU-AFS Antispoofing System for the ASVspoof 5 Challenge
This work addresses spoofing attacks in automatic speaker verification, presenting an incremental improvement through fine-tuning and fusion strategies.
The paper tackles the problem of anti-spoofing in speech systems by developing the SZU-AFS system for the ASVspoof 5 Challenge, achieving a minDCF of 0.115 and an EER of 4.04% on the evaluation set.
This paper presents the SZU-AFS anti-spoofing system, designed for Track 1 of the ASVspoof 5 Challenge under open conditions. The system is built with four stages: selecting a baseline model, exploring effective data augmentation (DA) methods for fine-tuning, applying a co-enhancement strategy based on gradient norm aware minimization (GAM) for secondary fine-tuning, and fusing logits scores from the two best-performing fine-tuned models. The system utilizes the Wav2Vec2 front-end feature extractor and the AASIST back-end classifier as the baseline model. During model fine-tuning, three distinct DA policies have been investigated: single-DA, random-DA, and cascade-DA. Moreover, the employed GAM-based co-enhancement strategy, designed to fine-tune the augmented model at both data and optimizer levels, helps the Adam optimizer find flatter minima, thereby boosting model generalization. Overall, the final fusion system achieves a minDCF of 0.115 and an EER of 4.04% on the evaluation set.