Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
This work addresses speaker diarization for speech processing researchers by introducing a novel integration of semantic cues, representing an incremental improvement over existing methods.
The paper tackled the problem of speaker diarization by integrating semantic information from speech content to improve clustering-based systems, resulting in consistent performance gains over acoustic-only methods as demonstrated on a public dataset.
Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit these semantic cues utilizing language models. In this work we propose a novel approach to effectively leverage semantic information in clustering-based speaker diarization systems. Firstly, we introduce spoken language understanding modules to extract speaker-related semantic information and utilize these information to construct pairwise constraints. Secondly, we present a novel framework to integrate these constraints into the speaker diarization pipeline, enhancing the performance of the entire system. Extensive experiments conducted on the public dataset demonstrate the consistent superiority of our proposed approach over acoustic-only speaker diarization systems.