ChinaTelecom System Description to VoxCeleb Speaker Recognition Challenge 2023
This work addresses speaker recognition for audio processing, but it is incremental as it builds on existing ResNet methods with fusion and calibration.
The paper tackled speaker recognition in the VoxCeleb2023 challenge by fusing multiple ResNet variants trained on VoxCeleb2 and applying score calibration, resulting in a minDCF of 0.1066 and EER of 1.980%.
This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later. Score calibration was also applied for each variant and the fused system. The final submission achieved minDCF of 0.1066 and EER of 1.980%.