Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
This work improves speaker diarization for applications like transcription and meeting analysis, representing a strong specific gain rather than a broad paradigm shift.
The paper tackles the problem of speaker diarization by addressing challenges in complex speaker embedding distributions and overlapping speech segments, proposing an overlapping community detection method that reduces the Diarization Error Rate to 15.94% on the DIHARD-III dataset without oracle VAD.
In speaker diarization, traditional clustering-based methods remain widely used in real-world applications. However, these methods struggle with the complex distribution of speaker embeddings and overlapping speech segments. To address these limitations, we propose an Overlapping Community Detection method based on Graph Attention networks and the Label Propagation Algorithm (OCDGALP). The proposed framework comprises two key components: (1) a graph attention network that refines speaker embeddings and node connections by aggregating information from neighboring nodes, and (2) a label propagation algorithm that assigns multiple community labels to each node, enabling simultaneous clustering and overlapping community detection. Experimental results show that the proposed method significantly reduces the Diarization Error Rate (DER), achieving a state-of-the-art 15.94% DER on the DIHARD-III dataset without oracle Voice Activity Detection (VAD), and an impressive 11.07% with oracle VAD.