Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020
This work provides incremental improvements for the speaker recognition community by offering a baseline system for the VoxCeleb challenge.
The authors tackled speaker recognition by analyzing and training ResNet variants with various loss functions, achieving significant improvements over most existing works without ensemble or post-processing.
This report describes our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020. We perform a careful analysis of speaker recognition models based on the popular ResNet architecture, and train a number of variants using a range of loss functions. Our results show significant improvements over most existing works without the use of model ensemble or post-processing. We release the training code and pre-trained models as unofficial baselines for this year's challenge.