ASAISDMLOct 24, 2022

High Fidelity Neural Audio Compression

arXiv:2210.13438v11182 citationsh-index: 50Has Code
Originality Highly original
AI Analysis

This work addresses the need for efficient and high-quality audio compression for applications like streaming and storage, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of real-time, high-fidelity audio compression by introducing a neural codec that outperforms baselines across various audio domains and bandwidths, achieving up to 40% further compression with lightweight Transformers while maintaining faster-than-real-time performance.

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective, architectural changes and a study of various perceptual loss functions. We present an extensive subjective evaluation (MUSHRA tests) together with an ablation study for a range of bandwidths and audio domains, including speech, noisy-reverberant speech, and music. Our approach is superior to the baselines methods across all evaluated settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio. Code and models are available at github.com/facebookresearch/encodec.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes