CLMay 5

A Comprehensive Analysis of Tokenization and Self-Supervised Learning in End-to-End Automatic Speech Recognition applied on French Language

Thibault Bañeras-Roux, Mickael Rouvier, Jane Wottawa, Richard Dufour

arXiv:2605.0369617.03 citations

Predicted impact top 68% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

For researchers building French ASR systems, this work highlights the importance of evaluating beyond standard error rates, though it is an incremental study limited to a single language.

This paper analyzes the impact of subword tokenization and self-supervised learning on end-to-end ASR for French, using a comprehensive set of metrics beyond CER/WER. The study provides qualitative insights into how these choices affect linguistic and acoustic aspects of transcription.

The performance of end-to-end automatic speech recognition (ASR) systems enables their increasing integration into numerous applications. While there are various benefits to such speech-to-text systems, the choice of hyperparameters and models plays a crucial role in their performance. Typically, these choices are determined by considering only the character (CER) and/or word error rate (WER) metrics. However, it has been shown in several studies that these metrics are largely incomplete and fail to adequately describe the downstream application of automatic transcripts. In this paper, we conduct a qualitative study on the French language that investigates the impact of subword tokenization algorithms and self-supervised learning models from different linguistic and acoustic perspectives, using a comprehensive set of evaluation metrics.

View on arXiv PDF

Similar