ASLGSDMLFeb 23, 2020

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

arXiv:2002.12761v212 citations
AI Analysis

This work addresses the difficult problem of speech diarization for researchers and practitioners, but it is incremental as it builds on existing modules and techniques.

The paper tackles the DIHARD II speech diarization challenge by developing a system with multiple modules, achieving a diarization error rate (DER) of 18.84% in Track1 and 27.90% in Track2, which represents relative reductions of 27.5% and 31.7% compared to official baselines.

In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes