Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
This work addresses audio source separation for applications like speech enhancement, but it is incremental as it builds on existing convolutional methods.
The authors tackled monaural audio source separation by introducing multi-resolution fully convolutional neural networks (MR-FCNN) that capture both global and local details, resulting in improved performance over feedforward DNNs and single-resolution FCNNs.
In deep neural networks with convolutional layers, each layer typically has fixed-size/single-resolution receptive field (RF). Convolutional layers with a large RF capture global information from the input features, while layers with small RF size capture local details with high resolution from the input features. In this work, we introduce novel deep multi-resolution fully convolutional neural networks (MR-FCNN), where each layer has different RF sizes to extract multi-resolution features that capture the global and local details information from its input features. The proposed MR-FCNN is applied to separate a target audio source from a mixture of many audio sources. Experimental results show that using MR-FCNN improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional neural networks (FCNNs) on the audio source separation problem.