AS CLJul 5, 2024

Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models

Bolaji Yusuf, Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran

arXiv:2407.04641v13.33 citationsh-index: 51

Originality Synthesis-oriented

AI Analysis

This addresses latency reduction in speech recognition systems, but it appears incremental as it builds on existing ASR and language model techniques.

The paper tackles the problem of reducing latency in automatic speech recognition by introducing speculative speech recognition, which allows the recognizer to run ahead of audio, and demonstrates its efficacy on various datasets.

This paper explores speculative speech recognition (SSR), where we empower conventional automatic speech recognition (ASR) with speculation capabilities, allowing the recognizer to run ahead of audio. We introduce a metric for measuring SSR performance and we propose a model which does SSR by combining a RNN-Transducer-based ASR system with an audio-prefixed language model (LM). The ASR system transcribes ongoing audio and feeds the resulting transcripts, along with an audio-dependent prefix, to the LM, which speculates likely completions for the transcriptions. We experiment with a variety of ASR datasets on which show the efficacy our method and the feasibility of SSR as a method of reducing ASR latency.

View on arXiv PDF

Similar