SD LG ASOct 22, 2020

LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation

Woosung Choi, Minseok Kim, Jaehwa Chung, Soonyoung Jung

arXiv:2010.11631v213.740 citationsHas Code

Originality Incremental advance

AI Analysis

This work improves audio source separation for applications like music production, though it is incremental by building on existing CUNet models.

The paper tackled multi-source audio separation by extending Frequency Transformation blocks to capture source-dependent patterns and proposing a gated modulation method, achieving state-of-the-art SDR performance on MUSDB18 tasks.

Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate internal features. By employing these two novel methods, we extend the Conditioned-U-Net (CUNet) for multi-source separation, and the experimental results indicate that our LaSAFT and GPoCM can improve the CUNet's performance, achieving state-of-the-art SDR performance on several MUSDB18 source separation tasks.

View on arXiv PDF Code

Similar