ASSDFeb 6, 2021

The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge

arXiv:2102.03649v1
Originality Synthesis-oriented
AI Analysis

This work presents a competitive system for speech diarization, which is crucial for transcribing multi-speaker audio in various applications.

This paper describes the DKU-Duke-Lenovo team's submission to the third DIHARD Speech Diarization Challenge. Their system achieved a Diarization Error Rate (DER) of 15.43% for the core evaluation set and 13.39% for the full evaluation set on task 1, and 21.63% for the core evaluation set and 18.90% for the full evaluation set on task 2.

In this paper, we present the submitted system for the third DIHARD Speech Diarization Challenge from the DKU-Duke-Lenovo team. Our system consists of several modules: voice activity detection (VAD), segmentation, speaker embedding extraction, attentive similarity scoring, agglomerative hierarchical clustering. In addition, the target speaker VAD (TSVAD) is used for the phone call data to further improve the performance. Our final submitted system achieves a DER of 15.43% for the core evaluation set and 13.39% for the full evaluation set on task 1, and we also get a DER of 21.63% for core evaluation set and 18.90% for full evaluation set on task 2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes