AS SDJun 8, 2021

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection

Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu

arXiv:2106.04078v18.631 citations

Originality Incremental advance

AI Analysis

This work addresses speaker diarization for audio processing applications, representing an incremental improvement over existing methods.

The paper tackles speaker diarization by proposing a conditional multitask learning method that improves performance over conventional end-to-end neural systems, specifically reducing diarization error rates.

In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. In this paper, to further improve the performance of the EEND system, we propose a novel multitask learning framework that solves speaker diarization and a desired subtask while explicitly considering the task dependency. We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule. Experimental results show that our proposed method can leverage a subtask to effectively model speaker diarization, and outperforms conventional EEND systems in terms of diarization error rate.

View on arXiv PDF

Similar