XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021
This is an incremental improvement for speaker recognition systems in noisy environments.
The paper tackles speaker recognition and diarization in the VoxCeleb challenge, achieving a DER of 5.54% and JER of 27.11% on the evaluation set for track 4.
This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021. For track 2, we evaluate two systems including ResNet34-SE and ECAPA-TDNN. For track 4, an important part of our system is VAD module which greatly improves the performance. Our best submission on the track 4 obtained on the evaluation set DER 5.54% and JER 27.11%, while the performance on the development set is DER 2.92% and JER 20.84%.