CL SD ASJun 21, 2023

NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning

Kamer Ali Yuksel, Thiago Ferreira, Golara Javadi, Mohamed El-Badrashiny, Ahmet Gunduz

arXiv:2306.12577v12.15 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient ASR evaluation for researchers and developers, though it is incremental as it builds on existing language models and contrastive learning techniques.

The paper tackles the problem of evaluating automatic speech recognition (ASR) systems without costly ground-truth transcripts by introducing NoRefER, a referenceless quality metric that uses semi-supervised language model fine-tuning with contrastive learning, achieving high correlation with reference-based metrics.

This paper introduces NoRefER, a novel referenceless quality metric for automatic speech recognition (ASR) systems. Traditional reference-based metrics for evaluating ASR systems require costly ground-truth transcripts. NoRefER overcomes this limitation by fine-tuning a multilingual language model for pair-wise ranking ASR hypotheses using contrastive learning with Siamese network architecture. The self-supervised NoRefER exploits the known quality relationships between hypotheses from multiple compression levels of an ASR for learning to rank intra-sample hypotheses by quality, which is essential for model comparisons. The semi-supervised version also uses a referenced dataset to improve its inter-sample quality ranking, which is crucial for selecting potentially erroneous samples. The results indicate that NoRefER correlates highly with reference-based metrics and their intra-sample ranks, indicating a high potential for referenceless ASR evaluation or a/b testing.

View on arXiv PDF Code

Similar