The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge
This work addresses speaker diarization for audio processing applications, but it is incremental as it builds on existing methods for a specific challenge.
The team tackled speaker diarization in the VoxSRC 2021 challenge by developing a system with multiple components like VAD and TS-VAD, achieving a diarization error rate (DER) of 5.07% on the test set.
This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice activity detection (TS-VAD) model. Our final submission, consisting of 5 independent systems, achieves a DER of 5.07% on the challenge test set.