CL SD ASMay 22, 2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Luyao Cheng, Siqi Zheng, Zhang Qinglin, Hui Wang, Yafeng Chen, Qian Chen

arXiv:2305.12927v126.4224 citations

Originality Incremental advance

AI Analysis

This work addresses speaker diarization for multi-party scenarios like meetings, offering an incremental improvement by integrating semantic information.

The paper tackled the problem of speaker diarization performance degradation in adverse acoustic conditions by extracting speaker-related information from semantic content in multi-party meetings, resulting in consistent improvements over acoustic-only systems on AISHELL-4 and AliMeeting datasets.

Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic conditions. In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization. We introduce two sub-tasks, Dialogue Detection and Speaker-Turn Detection, in which we effectively extract speaker information from conversational semantics. We also propose a simple yet effective algorithm to jointly model acoustic and semantic information and obtain speaker-identified texts. Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.

View on arXiv PDF

Similar