ASLGSDNov 4, 2019

Supervised online diarization with sample mean loss for multi-domain data

arXiv:1911.01266v326 citations
Originality Incremental advance
AI Analysis

This work addresses speaker diarization for multi-domain audio data, presenting incremental improvements over existing supervised approaches.

The paper tackles speaker diarization by proposing modifications to the UIS-RNN model, including a novel Sample Mean Loss and better speaker turn modeling, which improve learning efficiency and performance. The method achieves similar performance to an offline baseline on the DIHARD II multi-domain dataset.

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network. In this paper we propose qualitative modifications to the model that significantly improve the learning efficiency and the overall diarization performance. In particular, we introduce a novel loss function, we called Sample Mean Loss and we present a better modelling of the speaker turn behaviour, by devising an analytical expression to compute the probability of a new speaker joining the conversation. In addition, we demonstrate that our model can be trained on fixed-length speech segments, removing the need for speaker change information in inference. Using x-vectors as input features, we evaluate our proposed approach on the multi-domain dataset employed in the DIHARD II challenge: our online method improves with respect to the original UIS-RNN and achieves similar performance to an offline agglomerative clustering baseline using PLDA scoring.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes