ASLGSDMar 9, 2020

Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

arXiv:2003.03998v1115 citations
AI Analysis

This addresses the challenge of unsatisfactory ASR performance in noisy conditions for single-channel systems, though it is incremental as it builds on existing time-domain enhancement methods.

The paper tackled the problem of improving noise-robust automatic speech recognition (ASR) in single-channel systems by using a time-domain denoising approach, achieving more than 30% relative word error reduction on the CHiME-4 dataset.

With the advent of deep learning, research on noise-robust automatic speech recognition (ASR) has progressed rapidly. However, ASR performance in noisy conditions of single-channel systems remains unsatisfactory. Indeed, most single-channel speech enhancement (SE) methods (denoising) have brought only limited performance gains over state-of-the-art ASR back-end trained on multi-condition training data. Recently, there has been much research on neural network-based SE methods working in the time-domain showing levels of performance never attained before. However, it has not been established whether the high enhancement performance achieved by such time-domain approaches could be translated into ASR. In this paper, we show that a single-channel time-domain denoising approach can significantly improve ASR performance, providing more than 30 % relative word error reduction over a strong ASR back-end on the real evaluation data of the single-channel track of the CHiME-4 dataset. These positive results demonstrate that single-channel noise reduction can still improve ASR performance, which should open the door to more research in that direction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes