SDLGMLApr 2, 2019

Unsupervised training of neural mask-based beamforming

arXiv:1904.01578v226 citations
Originality Highly original
AI Analysis

This addresses the challenge of robust speech processing in noisy and reverberant environments, offering an unsupervised alternative that avoids reliance on simulated data or teacher models.

The paper tackles the problem of training neural mask-based beamforming without requiring parallel data or pre-trained teacher models, achieving speech recognition performance on par with supervised systems using oracle masks on CHiME 4 and REVERB datasets.

We present an unsupervised training approach for a neural network-based mask estimator in an acoustic beamforming application. The network is trained to maximize a likelihood criterion derived from a spatial mixture model of the observations. It is trained from scratch without requiring any parallel data consisting of degraded input and clean training targets. Thus, training can be carried out on real recordings of noisy speech rather than simulated ones. In contrast to previous work on unsupervised training of neural mask estimators, our approach avoids the need for a possibly pre-trained teacher model entirely. We demonstrate the effectiveness of our approach by speech recognition experiments on two different datasets: one mainly deteriorated by noise (CHiME 4) and one by reverberation (REVERB). The results show that the performance of the proposed system is on par with a supervised system using oracle target masks for training and with a system trained using a model-based teacher.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes