Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems
This work addresses the challenge of improving ASR accuracy by leveraging diverse linguistic cues, though it appears incremental as it builds on existing transfer techniques.
The authors tackled the problem of incorporating linguistic knowledge into end-to-end automatic speech recognition (ASR) systems by transferring multiple representations from large language models (LLMs), showing that this approach is an effective alternative to using only a single representation.
Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation.