AS CL SDFeb 2, 2021

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

arXiv:2102.01363v112.640 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides a competitive speech diarization system for researchers and practitioners in speech processing, demonstrating strong performance in a challenging benchmark.

This paper describes the Hitachi-JHU system for the DIHARD III Speech Diarization Challenge, which combines five subsystems (x-vector, end-to-end neural, and hybrid) using DOVER-Lap. The combined system achieved diarization error rates of 11.58% and 14.09% in Track 1 full and core, and 16.94% and 20.01% in Track 2 full and core, securing second place in all challenge tasks.

This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five subsystems: two x-vector-based subsystems, two end-to-end neural diarization-based subsystems, and one hybrid subsystem. We refine each system and all five subsystems become competitive and complementary. After the DOVER-Lap based system combination, it achieved diarization error rates of 11.58 % and 14.09 % in Track 1 full and core, and 16.94 % and 20.01 % in Track 2 full and core, respectively. With their results, we won second place in all the tasks of the challenge.

View on arXiv PDF Code

Similar