AS SDSep 5, 2021

The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge

Weiqing Wang, Danwei Cai, Qingjian Lin, Lin Yang, Junjie Wang, Jin Wang, Ming Li

arXiv:2109.02002v28.632 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speaker diarization for audio processing applications, but it is incremental as it builds on existing methods for a specific challenge.

The team tackled speaker diarization in the VoxSRC 2021 challenge by developing a system with multiple components like VAD and TS-VAD, achieving a diarization error rate (DER) of 5.07% on the test set.

This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice activity detection (TS-VAD) model. Our final submission, consisting of 5 independent systems, achieves a DER of 5.07% on the challenge test set.

View on arXiv PDF

Similar