CLSep 22, 2025

Interactive Real-Time Speaker Diarization Correction with Human Feedback

arXiv:2509.18377v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the need for more accurate speaker diarization in human-in-the-loop workflows, offering incremental improvements through interactive correction techniques.

The paper tackles the problem of speaker diarization errors in automatic speech processing by proposing an LLM-assisted system that allows users to correct speaker attributions in real time with verbal feedback, resulting in a 9.92% reduction in diarization error rate and a 44.23% reduction in speaker confusion error on the AMI test set.

Most automatic speech processing systems operate in "open loop" mode without user feedback about who said what; yet, human-in-the-loop workflows can potentially enable higher accuracy. We propose an LLM-assisted speaker diarization correction system that lets users fix speaker attribution errors in real time. The pipeline performs streaming ASR and diarization, uses an LLM to deliver concise summaries to the users, and accepts brief verbal feedback that is immediately incorporated without disrupting interactions. Moreover, we develop techniques to make the workflow more effective: First, a split-when-merged (SWM) technique detects and splits multi-speaker segments that the ASR erroneously attributes to just a single speaker. Second, online speaker enrollments are collected based on users' diarization corrections, thus helping to prevent speaker diarization errors from occurring in the future. LLM-driven simulations on the AMI test set indicate that our system substantially reduces DER by 9.92% and speaker confusion error by 44.23%. We further analyze correction efficacy under different settings, including summary vs full transcript display, the number of online enrollments limitation, and correction frequency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes