ASCLSDNov 22, 2022

Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition

arXiv:2211.16319v110 citationsh-index: 62
Originality Synthesis-oriented
AI Analysis

This work addresses the need for robust and fair evaluation metrics in code-switching ASR, which is incremental as it builds on existing methods by providing a new benchmark.

The paper tackled the problem of evaluating code-switching automatic speech recognition by developing a benchmark dataset with human judgments and assessing various metrics, finding that transliteration followed by text normalization achieved the highest correlation with human judgments.

Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minimal editing of automatic hypotheses. We validate the guidelines using 4-way inter-annotator agreement. We evaluate a large number of metrics in terms of correlation with human judgments. The metrics we consider vary in terms of representation (orthographic, phonological, semantic), directness (intrinsic vs extrinsic), granularity (e.g. word, character), and similarity computation method. The highest correlation to human judgment is achieved using transliteration followed by text normalization. We release the first corpus for human acceptance of code-switching speech recognition results in dialectal Arabic/English conversation speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes