CL SDDec 14, 2016

Recurrent Deep Stacking Networks for Speech Recognition

Peidong Wang, Zhongqiu Wang, Deliang Wang

arXiv:1612.04675v20.81 citations

Originality Incremental advance

AI Analysis

This work addresses speech recognition accuracy for applications in noisy environments, though it appears incremental as it builds on existing deep learning methods.

The authors tackled robust automatic speech recognition by integrating phoneme-level information into acoustic models using Recurrent Deep Stacking Networks (RDSNs) and Bi-Pass Stacking Networks (BPSNs), resulting in substantial performance improvements over conventional DNNs.

This paper presented our work on applying Recurrent Deep Stacking Networks (RDSNs) to Robust Automatic Speech Recognition (ASR) tasks. In the paper, we also proposed a more efficient yet comparable substitute to RDSN, Bi- Pass Stacking Network (BPSN). The main idea of these two models is to add phoneme-level information into acoustic models, transforming an acoustic model to the combination of an acoustic model and a phoneme-level N-gram model. Experiments showed that RDSN and BPsn can substantially improve the performances over conventional DNNs.

View on arXiv PDF

Similar