Retrieval Augmented Generation based context discovery for ASR
This addresses the challenge of identifying the right context automatically for ASR systems, which is incremental as it builds on existing retrieval and LLM methods.
This work tackled the problem of automatic context discovery for context-aware Automatic Speech Recognition (ASR) to improve transcription accuracy for rare or out-of-vocabulary terms, proposing an embedding-based retrieval approach that reduced word error rate (WER) by up to 17% relative to no-context baselines on datasets like TED-LIUMv3.
This work investigates retrieval augmented generation as an efficient strategy for automatic context discovery in context-aware Automatic Speech Recognition (ASR) system, in order to improve transcription accuracy in the presence of rare or out-of-vocabulary terms. However, identifying the right context automatically remains an open challenge. This work proposes an efficient embedding-based retrieval approach for automatic context discovery in ASR. To contextualize its effectiveness, two alternatives based on large language models (LLMs) are also evaluated: (1) large language model (LLM)-based context generation via prompting, and (2) post-recognition transcript correction using LLMs. Experiments on the TED-LIUMv3, Earnings21 and SPGISpeech demonstrate that the proposed approach reduces WER by up to 17% (percentage difference) relative to using no-context, while the oracle context results in a reduction of up to 24.1%.