Incorporating Language Level Information into Acoustic Models
This work addresses the challenge of improving ASR accuracy by incorporating linguistic context, though it appears incremental as it builds on existing deep recurrent neural network methods.
The paper tackled the problem of integrating language-level information into acoustic models for Automatic Speech Recognition (ASR) by proposing Recurrent Deep Language Networks (RDLNs), which enable fine-tuning of the entire ASR system during acoustic modeling.
This paper proposed a class of novel Deep Recurrent Neural Networks which can incorporate language-level information into acoustic models. For simplicity, we named these networks Recurrent Deep Language Networks (RDLNs). Multiple variants of RDLNs were considered, including two kinds of context information, two methods to process the context, and two methods to incorporate the language-level information. RDLNs provided possible methods to fine-tune the whole Automatic Speech Recognition (ASR) system in the acoustic modeling process.