CL ASFeb 21

ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models

Zefang Liu, Chenyang Zhu, Sangwoo Cho, Shi-Xiong Zhang

arXiv:2602.18721v1

Originality Highly original

AI Analysis

This addresses error propagation in semi-supervised ASR, offering a novel method for improving pseudo-label quality, though it is incremental as it builds on existing pseudo-labeling techniques.

The paper tackled the problem of confirmation bias and error accumulation in semi-supervised speech recognition by proposing ReHear, a framework that uses an audio-aware large language model to iteratively refine pseudo-labels, resulting in consistent outperformance over supervised and pseudo-labeling baselines across diverse benchmarks.

Semi-supervised learning in automatic speech recognition (ASR) typically relies on pseudo-labeling, which often suffers from confirmation bias and error accumulation due to noisy supervision. To address this limitation, we propose ReHear, a framework for iterative pseudo-label refinement that integrates an instruction-tuned, audio-aware large language model (LLM) into the self-training loop. Unlike conventional text-based correctors, our approach conditions the LLM on both the ASR hypothesis and the source audio, allowing it to recover phonetically accurate transcripts even from severe recognition errors. These refined pseudo-labels serve as high-fidelity targets for fine-tuning the ASR model in an iterative cycle. Experimental results across diverse benchmarks demonstrate that ReHear effectively mitigates error propagation, consistently outperforming both supervised and pseudo-labeling baselines.

View on arXiv PDF

Similar