CLSep 14, 2023

Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)

arXiv:2309.07677v12 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses the problem of more comprehensive evaluation for researchers and developers in speaker diarization and dialogue systems, though it is incremental as it builds on existing metrics.

The paper tackles limitations in evaluating text-based speaker diarization by proposing two new metrics—Text-based Diarization Error Rate and Diarization F1—that use utterance- and word-level token alignment to capture more error types, and introduces tools (align4d and TranscribeView) for alignment and visualization.

This paper presents a novel evaluation approach to text-based speaker diarization (SD), tackling the limitations of traditional metrics that do not account for any contextual information in text. Two new metrics are proposed, Text-based Diarization Error Rate and Diarization F1, which perform utterance- and word-level evaluations by aligning tokens in reference and hypothesis transcripts. Our metrics encompass more types of errors compared to existing ones, allowing us to make a more comprehensive analysis in SD. To align tokens, a multiple sequence alignment algorithm is introduced that supports multiple sequences in the reference while handling high-dimensional alignment to the hypothesis using dynamic programming. Our work is packaged into two tools, align4d providing an API for our alignment algorithm and TranscribeView for visualizing and evaluating SD errors, which can greatly aid in the creation of high-quality data, fostering the advancement of dialogue systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes