CLSDASJun 5, 2025

Customizing Speech Recognition Model with Large Language Model Feedback

arXiv:2506.11091v2h-index: 2
Originality Incremental advance
AI Analysis

This work addresses domain adaptation for speech recognition, particularly for rare named entities, but is incremental as it builds on existing reinforcement learning and LLM feedback techniques.

The paper tackles the problem of speech recognition models struggling with rare named entities and domain mismatches by proposing a reinforcement learning approach that uses large language model feedback for unsupervised domain adaptation, achieving a 21% improvement in entity word error rate over conventional self-training methods.

Automatic speech recognition (ASR) systems have achieved strong performance on general transcription tasks. However, they continue to struggle with recognizing rare named entities and adapting to domain mismatches. In contrast, large language models (LLMs), trained on massive internet-scale datasets, are often more effective across a wide range of domains. In this work, we propose a reinforcement learning based approach for unsupervised domain adaptation, leveraging unlabeled data to enhance transcription quality, particularly the named entities affected by domain mismatch, through feedback from a LLM. Given contextual information, our framework employs a LLM as the reward model to score the hypotheses from the ASR model. These scores serve as reward signals to fine-tune the ASR model via reinforcement learning. Our method achieves a 21\% improvement on entity word error rate over conventional self-training methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes