CLLGSDASApr 2, 2023

Multilingual Word Error Rate Estimation: e-WER3

arXiv:2304.00649v19 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses the problem of performance evaluation for multilingual ASR systems, which is crucial for voice-driven applications, though it appears incremental as it builds on prior monolingual estimation methods.

The paper tackles the challenge of measuring multilingual automatic speech recognition performance without manual transcriptions by proposing eWER3, a framework that jointly trains on acoustic and lexical representations to estimate word error rate, achieving a 9% absolute increase in Pearson correlation coefficient compared to a previous monolingual method.

The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework -- eWER3 -- jointly trained on acoustic and lexical representation to estimate word error rate. We demonstrate the effectiveness of eWER3 to (i) predict WER without using any internal states from the ASR and (ii) use the multilingual shared latent space to push the performance of the close-related languages. We show our proposed multilingual model outperforms the previous monolingual word error rate estimation method (eWER2) by an absolute 9\% increase in Pearson correlation coefficient (PCC), with better overall estimation between the predicted and reference WER.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes