ASLGSDMLFeb 14, 2020

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

arXiv:2002.05879v124 citations
AI Analysis

This addresses the challenge of improving subjective sound quality in speech enhancement for applications like audio processing, though it is incremental as it builds on prior approximation methods.

The paper tackled the problem of training deep neural networks for speech enhancement using non-differentiable perceptual quality metrics like PESQ by proposing stabilization techniques from reinforcement learning to overcome training instability. The result was stable training that achieved state-of-the-art PESQ scores on a public dataset and better subjective sound quality than conventional methods.

Improving subjective sound quality of enhanced signals is one of the most important missions in speech enhancement. For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality). However, direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters. Therefore, the previous study has proposed to approximate the score of OSQAs by an auxiliary DNN so that its gradient can be used for training the primary DNN. One problem with this approach is instability of the training caused by the approximation error of the score. To overcome this problem, we propose to use stabilization techniques borrowed from reinforcement learning. The experiments, aimed to increase the score of PESQ as an example, show that the proposed method (i) can stably train a DNN to increase PESQ, (ii) achieved the state-of-the-art PESQ score on a public dataset, and (iii) resulted in better sound quality than conventional methods based on subjective evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes