CL ASApr 2, 2025

Chain of Correction for Full-text Speech Recognition with Large Language Models

Zhiyuan Tang, Dong Wang, Zhikai Zhou, Yong Liu, Shen Huang, Shidong Shang

Tencent

arXiv:2504.01519v21 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This addresses stability and controllability issues in ASR error correction for applications like transcription, but it is incremental as it builds on existing LLM-based approaches.

The paper tackles the problem of correcting errors in full-text automatic speech recognition outputs, such as punctuation and normalization, by proposing a Chain of Correction method that uses a multi-turn chat format to process segments with context, resulting in significant performance improvements over baseline systems.

Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) is attracting increased attention for its ability to address a wide range of error types, such as punctuation restoration and inverse text normalization, across long context. However, challenges remain regarding stability, controllability, completeness, and fluency. To mitigate these issues, this paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding. Utilizing the open-sourced ChFT dataset, we fine-tune a pre-trained LLM to evaluate CoC's performance. Experiments show that CoC significantly outperforms baseline and benchmark systems in correcting full-text ASR outputs. We also analyze correction thresholds to balance under-correction and over-rephrasing, extrapolate CoC on extra-long ASR outputs, and explore using other types of information to guide error correction.

View on arXiv PDF

Similar