ASSDMar 23, 2018

Exploring the robustness of features and enhancement on speech recognition systems in highly-reverberant real environments

arXiv:1803.09013v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speech recognition performance in challenging real-world acoustic conditions, but it is incremental as it compares existing enhancement and feature methods.

This paper evaluated the robustness of a DNN-HMM speech recognition system in highly-reverberant real environments, finding that using Weighted Prediction Error (WPE) enhancement with locally-normalized filter bank (LNFB) features reduced word error rates by 3% and 20% on average compared to other methods under reverberated training conditions.

This paper evaluates the robustness of a DNN-HMM-based speech recognition system in highly-reverberant real environments using the HRRE database. The performance of locally-normalized filter bank (LNFB) and Mel filter bank (MelFB) features in combination with Non-negative Matrix Factorization (NMF), Suppression of Slowly-varying components and the Falling edge (SSF) and Weighted Prediction Error (WPE) enhancement methods are discussed and evaluated. Two training conditions were considered: clean and reverberated (Reverb). With Reverb training the use of WPE and LNFB provides WERs that are 3% and 20% lower in average than SSF and NMF, respectively. WPE and MelFB provides WERs that are 11% and 24% lower in average than SSF and NMF, respectively. With clean training, which represents a significant mismatch between testing and training conditions, LNFB features clearly outperform MelFB features. The results show that different types of training, parametrization, and enhancement techniques may work better for a specific combination of speaker-microphone distance and reverberation time. This suggests that there could be some degree of complementarity between systems trained with different enhancement and parametrization methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes