ASSDMLNov 5, 2021

Hybrid Spectrogram and Waveform Source Separation

arXiv:2111.03600v3234 citations
Originality Highly original
AI Analysis

This work addresses source separation for audio processing, offering a novel approach that outperforms existing methods in a competitive benchmark.

The paper tackles source separation by proposing a hybrid model that dynamically selects between spectrogram and waveform domains, achieving a 1.4 dB SDR improvement on the MusDB HQ dataset and winning the Music Demixing Challenge 2021.

Source separation models either work on the spectrogram or waveform domain. In this work, we show how to perform end-to-end hybrid source separation, letting the model decide which domain is best suited for each source, and even combining both. The proposed hybrid version of the Demucs architecture won the Music Demixing Challenge 2021 organized by Sony. This architecture also comes with additional improvements, such as compressed residual branches, local attention or singular value regularization. Overall, a 1.4 dB improvement of the Signal-To-Distortion (SDR) was observed across all sources as measured on the MusDB HQ dataset, an improvement confirmed by human subjective evaluation, with an overall quality rated at 2.83 out of 5 (2.36 for the non hybrid Demucs), and absence of contamination at 3.04 (against 2.37 for the non hybrid Demucs and 2.44 for the second ranking model submitted at the competition).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes