ASAILGSDSPApr 7, 2024

Gull: A Generative Multifunctional Audio Codec

arXiv:2404.04947v211 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the need for a multifunctional audio codec that can handle diverse applications, representing an incremental improvement by integrating and enhancing existing techniques.

The authors tackled the problem of developing a versatile neural audio codec for tasks like real-time communication and audio super-resolution, achieving performance on par or better than existing codecs across various sample rates, bitrates, and model complexities.

We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolution, and codec language models. The key components of Gull include (1) universal-sample-rate modeling via subband modeling schemes motivated by recent progress in audio source separation, (2) gain-shape representations motivated by traditional audio codecs, (3) improved residual vector quantization modules, (4) elastic decoder network that enables user-defined model size and complexity during inference time, (5) built-in ability for audio super-resolution without the increase of bitrate. We compare Gull with existing traditional and neural audio codecs and show that Gull is able to achieve on par or better performance across various sample rates, bitrates and model complexities in both subjective and objective evaluation metrics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes