CLSDASJul 9, 2023

Can Generative Large Language Models Perform ASR Error Correction?

arXiv:2307.04172v289 citationsh-index: 61
Originality Incremental advance
AI Analysis

This addresses the computational intensity and system-specific tuning of traditional ASR error correction methods, offering a more flexible approach for speech recognition systems.

The paper tackles the problem of ASR error correction by using ChatGPT, a generative large language model, to post-process speech recognition outputs, showing that it yields performance gains for two state-of-the-art ASR architectures and multiple test sets.

ASR error correction is an interesting option for post processing speech recognition system outputs. These error correction models are usually trained in a supervised fashion using the decoding results of a target ASR system. This approach can be computationally intensive and the model is tuned to a specific ASR system. Recently generative large language models (LLMs) have been applied to a wide range of natural language processing tasks, as they can operate in a zero-shot or few shot fashion. In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction. Based on the ASR N-best output, we propose both unconstrained and constrained, where a member of the N-best list is selected, approaches. Additionally, zero and 1-shot settings are evaluated. Experiments show that this generative LLM approach can yield performance gains for two different state-of-the-art ASR architectures, transducer and attention-encoder-decoder based, and multiple test sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes