A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement
This work addresses noise suppression for real-time speech processing applications, presenting an incremental improvement over existing methods.
The paper tackles real-time full-band speech enhancement by combining a deep neural network with traditional pitch filtering, achieving significantly higher quality than a minimum mean squared error spectral estimator while maintaining low complexity for 48 kHz operation on a low-power processor.
Despite noise suppression being a mature area in signal processing, it remains highly dependent on fine tuning of estimator algorithms and parameters. In this paper, we demonstrate a hybrid DSP/deep learning approach to noise suppression. A deep neural network with four hidden layers is used to estimate ideal critical band gains, while a more traditional pitch filter attenuates noise between pitch harmonics. The approach achieves significantly higher quality than a traditional minimum mean squared error spectral estimator, while keeping the complexity low enough for real-time operation at 48 kHz on a low-power processor.