SDLGASAug 26, 2022

Music Separation Enhancement with Generative Modeling

arXiv:2208.12387v113 citationsh-index: 34
Originality Synthesis-oriented
AI Analysis

This work addresses audio quality issues in music separation for listeners and audio engineers, but it is incremental as it builds on existing separation systems.

The paper tackles the problem of perceptual shortcomings in music source separation systems, such as noise and harmonic loss, by proposing a post-processing model (MSG) that improves source reconstruction for both waveform-based and spectrogram-based separators, with crowdsourced evaluations showing human preference for enhanced bass and drum estimates.

Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-art waveform-based and spectrogram-based music source separators, including a separator unseen by MSG during training. Our analysis of the errors produced by source separators shows that waveform models tend to introduce more high-frequency noise, while spectrogram models tend to lose transients and high frequency content. We introduce objective measures to quantify both kinds of errors and show MSG improves the source reconstruction of both kinds of errors. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes