CLSDASMay 22, 2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

arXiv:2305.12927v1224 citations
Originality Incremental advance
AI Analysis

This work addresses speaker diarization for multi-party scenarios like meetings, offering an incremental improvement by integrating semantic information.

The paper tackled the problem of speaker diarization performance degradation in adverse acoustic conditions by extracting speaker-related information from semantic content in multi-party meetings, resulting in consistent improvements over acoustic-only systems on AISHELL-4 and AliMeeting datasets.

Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in performance degradation when encountering adverse acoustic conditions. In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization. We introduce two sub-tasks, Dialogue Detection and Speaker-Turn Detection, in which we effectively extract speaker information from conversational semantics. We also propose a simple yet effective algorithm to jointly model acoustic and semantic information and obtain speaker-identified texts. Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes