ASLGSDSPJul 7, 2022

NESC: Robust Neural End-2-End Speech Coding with GANs

arXiv:2207.03282v118 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing robust neural speech codecs for real-world applications, representing an incremental improvement with a focus on robustness.

The authors tackled the problem of robust neural speech coding at very low bit rates by introducing NESC, a scalable end-to-end codec that achieved high-quality wideband speech coding at 3 kbps, with subjective tests showing robustness to unseen conditions and signal perturbations.

Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed Dual-PathConvRNN (DPCRNN) layer, while the decoder architecture is based on our previous work Streamwise-StyleMelGAN. Our subjective listening tests on clean and noisy speech show that NESC is particularly robust to unseen conditions and signal perturbations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes