SpectroStream: A Versatile Neural Codec for General Audio
This work addresses audio compression for music applications, offering incremental improvements over existing methods like SoundStream.
The authors tackled the problem of high-quality audio compression for 48 kHz stereo music at low bit rates (4-16 kbps), achieving this with SpectroStream, a neural codec that improves audio quality at higher sample rates compared to its predecessor.
We propose SpectroStream, a full-band multi-channel neural audio codec. Successor to the well-established SoundStream, SpectroStream extends its capability beyond 24 kHz monophonic audio and enables high-quality reconstruction of 48 kHz stereo music at bit rates of 4--16 kbps. This is accomplished with a new neural architecture that leverages audio representation in the time-frequency domain, which leads to better audio quality especially at higher sample rate. The model also uses a delayed-fusion strategy to handle multi-channel audio, which is crucial in balancing per-channel acoustic quality and cross-channel phase consistency.