VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
This work addresses the problem of improving speaker recognition technology for researchers and practitioners, but it is incremental as it builds on previous editions of the challenge.
The paper tackled the challenge of speaker recognition and diarisation in unconstrained 'in the wild' data by organizing the VoxSRC 2021 challenge, which provided datasets and evaluation tools, resulting in the assessment of current technology and highlighting a new multi-lingual focus.
The third instalment of the VoxCeleb Speaker Recognition Challenge was held in conjunction with Interspeech 2021. The aim of this challenge was to assess how well current speaker recognition technology is able to diarise and recognise speakers in unconstrained or `in the wild' data. The challenge consisted of: (i) the provision of publicly available speaker recognition and diarisation data from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2021. This paper outlines the challenge, and describes the baselines, methods and results. We conclude with a discussion on the new multi-lingual focus of VoxSRC 2021, and on the progression of the challenge since the previous two editions.