ASCLSDFeb 2, 2021

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

arXiv:2102.01363v140 citations
Originality Incremental advance
AI Analysis

This work provides a competitive speech diarization system for researchers and practitioners in speech processing, demonstrating strong performance in a challenging benchmark.

This paper describes the Hitachi-JHU system for the DIHARD III Speech Diarization Challenge, which combines five subsystems (x-vector, end-to-end neural, and hybrid) using DOVER-Lap. The combined system achieved diarization error rates of 11.58% and 14.09% in Track 1 full and core, and 16.94% and 20.01% in Track 2 full and core, securing second place in all challenge tasks.

This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After the DOVER-Lap based system combination, it achieved diarization error rates of 11.58 % and 14.09 % in Track 1 full and core, and 16.94 % and 20.01 % in Track 2 full and core, respectively. With their results, we won second place in all the tasks of the challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes