SD AI ASAug 7, 2025

SpectroStream: A Versatile Neural Codec for General Audio

Yunpeng Li, Kehang Han, Brian McWilliams, Zalan Borsos, Marco Tagliasacchi

arXiv:2508.05207v17 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses audio compression for music applications, offering incremental improvements over existing methods like SoundStream.

The authors tackled the problem of high-quality audio compression for 48 kHz stereo music at low bit rates (4-16 kbps), achieving this with SpectroStream, a neural codec that improves audio quality at higher sample rates compared to its predecessor.

We propose SpectroStream, a full-band multi-channel neural audio codec. Successor to the well-established SoundStream, SpectroStream extends its capability beyond 24 kHz monophonic audio and enables high-quality reconstruction of 48 kHz stereo music at bit rates of 4--16 kbps. This is accomplished with a new neural architecture that leverages audio representation in the time-frequency domain, which leads to better audio quality especially at higher sample rate. The model also uses a delayed-fusion strategy to handle multi-channel audio, which is crucial in balancing per-channel acoustic quality and cross-channel phase consistency.

View on arXiv PDF

Similar