CLJan 18, 2025

A Benchmark of French ASR Systems Based on Error Severity

Antoine Tholly, Jane Wottawa, Mickael Rouvier, Richard Dufour

arXiv:2501.10879v120.919 citationsh-index: 16COLING

Originality Incremental advance

AI Analysis

This work addresses the limitation of existing ASR evaluation metrics by focusing on human-understandable error severity, specifically for French language systems, though it is incremental as it builds on prior error categorization approaches.

The authors tackled the problem of evaluating ASR systems by proposing a new metric that categorizes errors into severity levels based on linguistic criteria, applied to 10 state-of-the-art French ASR systems to reveal their strengths and weaknesses for user readability.

Automatic Speech Recognition (ASR) transcription errors are commonly assessed using metrics that compare them with a reference transcription, such as Word Error Rate (WER), which measures spelling deviations from the reference, or semantic score-based metrics. However, these approaches often overlook what is understandable to humans when interpreting transcription errors. To address this limitation, a new evaluation is proposed that categorizes errors into four levels of severity, further divided into subtypes, based on objective linguistic criteria, contextual patterns, and the use of content words as the unit of analysis. This metric is applied to a benchmark of 10 state-of-the-art ASR systems on French language, encompassing both HMM-based and end-to-end models. Our findings reveal the strengths and weaknesses of each system, identifying those that provide the most comfortable reading experience for users.

View on arXiv PDF

Similar