Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
This work addresses the need for efficient spoofing protection in speaker verification systems on edge devices, representing an incremental improvement in model compression.
The paper tackled the problem of deploying countermeasure models for automatic speaker verification on resource-constrained edge devices by proposing an adversarial speaker distillation method, which reduced the model to 22.5% of parameters and 19.4% of operations while achieving 0.2695 min t-DCF and 3.54% EER on the ASVspoof 2021 task.
The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system. Based on practicality and security considerations, the CM model is usually deployed on edge devices, which have more limited computing resources and storage space than cloud-based systems, confining the model size under a limitation. To better trade off the CM model sizes and performance, we proposed an adversarial speaker distillation method, which is an improved version of knowledge distillation method combined with generalized end-to-end (GE2E) pre-training and adversarial fine-tuning. In the evaluation phase of the ASVspoof 2021 Logical Access task, our proposed adversarial speaker distillation ResNetSE (ASD-ResNetSE) model reaches 0.2695 min t-DCF and 3.54% EER. ASD-ResNetSE only used 22.5% of parameters and 19.4% of multiply and accumulate operands of ResNetSE model.