A Supervised Speech enhancement Approach with Residual Noise Control for Voice Communication
This work addresses speech quality issues in voice communication applications, but it is incremental as it builds on existing supervised approaches with a new loss function.
The paper tackled the problem of speech enhancement in voice communication by deriving a generalized loss function that incorporates residual noise control, enabling a better trade-off between speech distortion and noise reduction, with objective and subjective tests verifying its importance.
For voice communication, it is important to extract the speech from its noisy version without introducing unnaturally artificial noise. By studying the subband mean-squared error (MSE) of the speech for unsupervised speech enhancement approaches and revealing its relationship with the existing loss function for supervised approaches, this paper derives a generalized loss function, when taking the residual noise control into account, for supervised approaches. Our generalized loss function contains the well-known MSE loss function and many other often-used loss functions as special cases. Compared with traditional loss functions, our generalized loss function is more flexible to make a good trade-off between speech distortion and noise reduction. This is because a group of well-studied noise shaping schemes can be introduced to control residual noise for practical applications. Objective and subjective test results verify the importance of residual noise control for the supervised speech enhancement approach.