CLSDDec 14, 2016

Recurrent Deep Stacking Networks for Speech Recognition

arXiv:1612.04675v21 citations
Originality Incremental advance
AI Analysis

This work addresses speech recognition accuracy for applications in noisy environments, though it appears incremental as it builds on existing deep learning methods.

The authors tackled robust automatic speech recognition by integrating phoneme-level information into acoustic models using Recurrent Deep Stacking Networks (RDSNs) and Bi-Pass Stacking Networks (BPSNs), resulting in substantial performance improvements over conventional DNNs.

This paper presented our work on applying Recurrent Deep Stacking Networks (RDSNs) to Robust Automatic Speech Recognition (ASR) tasks. In the paper, we also proposed a more efficient yet comparable substitute to RDSN, Bi- Pass Stacking Network (BPSN). The main idea of these two models is to add phoneme-level information into acoustic models, transforming an acoustic model to the combination of an acoustic model and a phoneme-level N-gram model. Experiments showed that RDSN and BPsn can substantially improve the performances over conventional DNNs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes