ASLGSDSPJan 30, 2024

Online speaker diarization of meetings guided by speech separation

arXiv:2402.00067v18 citationsh-index: 30ICASSP
Originality Incremental advance
AI Analysis

This work addresses the challenge of variable speaker counts in real meeting recordings for diarization systems, though it is incremental as it builds on existing separation methods.

The authors tackled the problem of overlapped speech in online speaker diarization for meetings by introducing a speech separation-guided scheme, resulting in state-of-the-art performance on the AMI headset mix with improvements in overlapped speech sections.

Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with realistic data because they are trained on simulated mixtures with a fixed number of speakers. In this work, we introduce a new speech separation-guided diarization scheme suitable for the online speaker diarization of long meeting recordings with a variable number of speakers, as present in the AMI corpus. We envisage ConvTasNet and DPRNN as alternatives for the separation networks, with two or three output sources. To obtain the speaker diarization result, voice activity detection is applied on each estimated source. The final model is fine-tuned end-to-end, after first adapting the separation to real data using AMI. The system operates on short segments, and inference is performed by stitching the local predictions using speaker embeddings and incremental clustering. The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech). Finally, we show the strength of our system particularly on overlapped speech sections.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes