AS LG SD SPJul 7, 2022

NESC: Robust Neural End-2-End Speech Coding with GANs

Nicola Pia, Kishan Gupta, Srikanth Korse, Markus Multrus, Guillaume Fuchs

arXiv:2207.03282v14.318 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses the challenge of designing robust neural speech codecs for real-world applications, representing an incremental improvement with a focus on robustness.

The authors tackled the problem of robust neural speech coding at very low bit rates by introducing NESC, a scalable end-to-end codec that achieved high-quality wideband speech coding at 3 kbps, with subjective tests showing robustness to unseen conditions and signal perturbations.

Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed Dual-PathConvRNN (DPCRNN) layer, while the decoder architecture is based on our previous work Streamwise-StyleMelGAN. Our subjective listening tests on clean and noisy speech show that NESC is particularly robust to unseen conditions and signal perturbations.

View on arXiv PDF

Similar