SDCLSep 4, 2017

Using Optimal Ratio Mask as Training Target for Supervised Speech Separation

arXiv:1709.00917v120 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speech separation for audio processing applications, but it is incremental as it builds on existing training target methods without introducing a new paradigm.

The paper tackles the problem of supervised speech separation by proposing the use of an optimal ratio mask as a training target for deep neural networks, which outperforms other targets across various noise environments and SNR conditions, though specific numerical gains are not provided.

Supervised speech separation uses supervised learning algorithms to learn a mapping from an input noisy signal to an output target. With the fast development of deep learning, supervised separation has become the most important direction in speech separation area in recent years. For the supervised algorithm, training target has a significant impact on the performance. Ideal ratio mask is a commonly used training target, which can improve the speech intelligibility and quality of the separated speech. However, it does not take into account the correlation between noise and clean speech. In this paper, we use the optimal ratio mask as the training target of the deep neural network (DNN) for speech separation. The experiments are carried out under various noise environments and signal to noise ratio (SNR) conditions. The results show that the optimal ratio mask outperforms other training targets in general.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes