SDAIJun 17, 2025

Exploring Speaker Diarization with Mixture of Experts

arXiv:2506.14750v1h-index: 10
Originality Incremental advance
AI Analysis

This addresses the problem of accurately identifying and segmenting speakers in noisy, multi-speaker audio for applications like transcription and meeting analysis, representing a strong incremental improvement.

The paper tackles speaker diarization by proposing a neural system with memory-aware embeddings and a mixture of experts module, achieving state-of-the-art results on multiple complex datasets like CHiME-6 and DIHARD-III.

In this paper, we propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates a memory-aware multi-speaker embedding module with a sequence-to-sequence architecture. The system leverages a memory module to enhance speaker embeddings and employs a Seq2Seq framework to efficiently map acoustic features to speaker labels. Additionally, we explore the application of mixture of experts in speaker diarization, and introduce a Shared and Soft Mixture of Experts (SS-MoE) module to further mitigate model bias and enhance performance. Incorporating SS-MoE leads to the extended model NSD-MS2S-SSMoE. Experiments on multiple complex acoustic datasets, including CHiME-6, DiPCo, Mixer 6 and DIHARD-III evaluation sets, demonstrate meaningful improvements in robustness and generalization. The proposed methods achieve state-of-the-art results, showcasing their effectiveness in challenging real-world scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes