ASLGSDSep 26, 2024

FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates

arXiv:2409.17635v25 citationsh-index: 11
AI Analysis

This addresses the problem of efficient audio compression for applications requiring low bit rates, such as real-time communication or streaming, with a novel approach that is incremental but offers practical advantages.

The paper tackles high-quality audio compression at low bit rates by introducing FlowMAC, a neural audio codec based on conditional flow matching. The result shows that FlowMAC at 3 kbps achieves similar quality as state-of-the-art GAN-based and DDPM-based codecs at double the bit rate (6 kbps).

This paper introduces FlowMAC, a novel neural audio codec for high-quality general audio compression at low bit rates based on conditional flow matching (CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder. At inference time the decoder integrates a continuous normalizing flow via an ODE solver to generate a high-quality mel spectrogram. This is the first time that a CFM-based approach is applied to general audio coding, enabling a scalable, simple and memory efficient training. Our subjective evaluations show that FlowMAC at 3 kbps achieves similar quality as state-of-the-art GAN-based and DDPM-based neural audio codecs at double the bit rate. Moreover, FlowMAC offers a tunable inference pipeline, which permits to trade off complexity and quality. This enables real-time coding on CPU, while maintaining high perceptual quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes