SDASJul 25, 2020

Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

arXiv:2007.12903v122 citations
AI Analysis

This work addresses the challenge of enhancing speech recognition in noisy environments for ASR systems, offering an incremental improvement over existing joint optimization techniques.

The paper tackles the problem of training a robust front-end for multi-channel automatic speech recognition without requiring aligned clean-noisy speech pairs, by incorporating flow-based density estimation, and achieves improved performance over conventional methods on the CHiME-4 dataset.

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes