ASSDSep 5, 2021

The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge

arXiv:2109.02002v232 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speaker diarization for audio processing applications, but it is incremental as it builds on existing methods for a specific challenge.

The team tackled speaker diarization in the VoxSRC 2021 challenge by developing a system with multiple components like VAD and TS-VAD, achieving a diarization error rate (DER) of 5.07% on the test set.

This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice activity detection (TS-VAD) model. Our final submission, consisting of 5 independent systems, achieves a DER of 5.07% on the challenge test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes