ASCLSDApr 13, 2020

Speaker Diarization with Lexical Information

arXiv:2004.06756v134 citations
AI Analysis

This work addresses speaker diarization for speech processing applications, offering an incremental improvement by combining lexical and acoustic cues without manual transcriptions.

The paper tackles speaker diarization by integrating lexical information from automatic speech recognition with acoustic data, improving accuracy over baseline systems that use only acoustic information.

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes