CL LG SD ASApr 2, 2023

Multilingual Word Error Rate Estimation: e-WER3

arXiv:2304.00649v12.59 citationsh-index: 32

Originality Incremental advance

AI Analysis

This work addresses the problem of performance evaluation for multilingual ASR systems, which is crucial for voice-driven applications, though it appears incremental as it builds on prior monolingual estimation methods.

The paper tackles the challenge of measuring multilingual automatic speech recognition performance without manual transcriptions by proposing eWER3, a framework that jointly trains on acoustic and lexical representations to estimate word error rate, achieving a 9% absolute increase in Pearson correlation coefficient compared to a previous monolingual method.

The success of the multilingual automatic speech recognition systems empowered many voice-driven applications. However, measuring the performance of such systems remains a major challenge, due to its dependency on manually transcribed speech data in both mono- and multilingual scenarios. In this paper, we propose a novel multilingual framework -- eWER3 -- jointly trained on acoustic and lexical representation to estimate word error rate. We demonstrate the effectiveness of eWER3 to (i) predict WER without using any internal states from the ASR and (ii) use the multilingual shared latent space to push the performance of the close-related languages. We show our proposed multilingual model outperforms the previous monolingual word error rate estimation method (eWER2) by an absolute 9\% increase in Pearson correlation coefficient (PCC), with better overall estimation between the predicted and reference WER.

View on arXiv PDF

Similar