LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation
This work improves audio source separation for applications like music production, though it is incremental by building on existing CUNet models.
The paper tackled multi-source audio separation by extending Frequency Transformation blocks to capture source-dependent patterns and proposing a gated modulation method, achieving state-of-the-art SDR performance on MUSDB18 tasks.
Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate internal features. By employing these two novel methods, we extend the Conditioned-U-Net (CUNet) for multi-source separation, and the experimental results indicate that our LaSAFT and GPoCM can improve the CUNet's performance, achieving state-of-the-art SDR performance on several MUSDB18 source separation tasks.