ASCLAug 5, 2025

RAG-Boost: Retrieval-Augmented Generation Enhanced LLM-based Speech Recognition

arXiv:2508.14048v11 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This addresses recognition accuracy issues for users of LLM-based speech recognition systems, though it appears incremental as it builds on existing RAG and ASR methods.

The paper tackles speech recognition errors in LLM-based ASR systems by integrating a retrieval-augmented generation (RAG) module that queries a vector store of audio-text pairs and domain terms to correct partial hypotheses, resulting in improved responses.

In this paper, we propose RAG-Boost (ST-ShinozakiLab Task I system), which enhances the baseline LLM-based ASR system of the MLC-SLM Challenge (task I) with a retrieval-augmented generation (RAG) module on the fly. Each partial ASR hypothesis queries a vector store of audio-text pairs and domain terms, and the retrieved results are fused with the live ASR hypotheses to fix recognition errors. The fused hypotheses are passed to the LLM, yielding improved responses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes