UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023
This work addresses speaker recognition for audio verification tasks, representing an incremental improvement in a competitive challenge setting.
The paper tackles speaker recognition in the VoxCeleb 2023 challenge by developing a system with ResNet and RepVGG architectures and a consistency-aware score calibration method, achieving first place in Track 1 and second in Track 2 with a minDCF of 0.0855 and EER of 1.5880%.
This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voiceprints in similarity score by a Consistency Measure Factor (CMF). CMF brings a huge performance boost in this challenge. Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023. The minDCF of our submission is 0.0855 and the EER is 1.5880%.