High-quality speech coding with SampleRNN
This addresses speech compression for applications requiring efficient transmission or storage, offering a novel generative approach with competitive quality.
The researchers tackled speech coding by developing a generative model based on SampleRNN that operates at significantly lower bitrates while matching or surpassing the perceptual quality of state-of-the-art classic wide-band codecs, as validated through listening tests.
We provide a speech coding scheme employing a generative model based on SampleRNN that, while operating at significantly lower bitrates, matches or surpasses the perceptual quality of state-of-the-art classic wide-band codecs. Moreover, it is demonstrated that the proposed scheme can provide a meaningful rate-distortion trade-off without retraining. We evaluate the proposed scheme in a series of listening tests and discuss limitations of the approach.