VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
This challenge provides a benchmark for speaker recognition and diarization technologies, which is important for researchers and developers working with 'in the wild' audio data.
The VoxSRC 2020 challenge evaluated speaker recognition and diarization technologies on unconstrained YouTube video data. It provided a public dataset, ground truth annotations, and evaluation software, culminating in a workshop at Interspeech 2020.
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2020. This paper outlines the challenge, and describes the baselines, methods used, and results. We conclude with a discussion of the progress over the first installment of the challenge.