SDAIASJul 26, 2024

Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks

arXiv:2407.18571v2h-index: 16
Originality Highly original
AI Analysis

This work addresses audio quality enhancement for applications like telephony and speech recognition, representing an incremental advance with a novel method for a known bottleneck.

The paper tackles speech bandwidth expansion to improve audio quality by expanding the frequency range of low-bandwidth signals, and it presents a high-fidelity generative adversarial network that outperforms previous methods and demonstrates zero-shot capability across various expansion factors.

Speech bandwidth expansion is crucial for expanding the frequency range of low-bandwidth speech signals, thereby improving audio quality, clarity and perceptibility in digital applications. Its applications span telephony, compression, text-to-speech synthesis, and speech recognition. This paper presents a novel approach using a high-fidelity generative adversarial network, unlike cascaded systems, our system is trained end-to-end on paired narrowband and wideband speech signals. Our method integrates various bandwidth upsampling ratios into a single unified model specifically designed for speech bandwidth expansion applications. Our approach exhibits robust performance across various bandwidth expansion factors, including those not encountered during training, demonstrating zero-shot capability. To the best of our knowledge, this is the first work to showcase this capability. The experimental results demonstrate that our method outperforms previous end-to-end approaches, as well as interpolation and traditional techniques, showcasing its effectiveness in practical speech enhancement applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes