Cellular Network Speech Enhancement: Removing Background and Transmission Noise
This work addresses speech quality issues for users in noisy environments during cellular VoIP calls, though it appears incremental as it builds on existing DNS Challenge frameworks.
The paper tackles the problem of removing both background and transmission noise in cellular network speech enhancement, specifically for VoIP applications like Google Meet, achieving state-of-the-art performance with scores of 1.92 PESQ and 0.88 STOI.
The primary objective of speech enhancement is to reduce background noise while preserving the target's speech. A common dilemma occurs when a speaker is confined to a noisy environment and receives a call with high background and transmission noise. To address this problem, the Deep Noise Suppression (DNS) Challenge focuses on removing the background noise with the next-generation deep learning models to enhance the target's speech; however, researchers fail to consider Voice Over IP (VoIP) applications their transmission noise. Focusing on Google Meet and its cellular application, our work achieves state-of-the-art performance on the Google Meet To Phone Track of the VoIP DNS Challenge. This paper demonstrates how to beat industrial performance and achieve 1.92 PESQ and 0.88 STOI, as well as superior acoustic fidelity, perceptual quality, and intelligibility in various metrics.