Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks
This work addresses audio quality enhancement for applications like telephony and speech recognition, representing an incremental advance with a novel method for a known bottleneck.
The paper tackles speech bandwidth expansion to improve audio quality by expanding the frequency range of low-bandwidth signals, and it presents a high-fidelity generative adversarial network that outperforms previous methods and demonstrates zero-shot capability across various expansion factors.
Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.