Hybrid Y-Net Architecture for Singing Voice Separation
This work addresses singing voice separation for music processing applications, presenting an incremental improvement over existing methods.
The paper tackles music source separation by proposing a hybrid Y-Net architecture that extracts features from both spectrogram and waveform domains, achieving effective separation with fewer parameters.
This research paper presents a novel deep learning-based neural network architecture, named Y-Net, for achieving music source separation. The proposed architecture performs end-to-end hybrid source separation by extracting features from both spectrogram and waveform domains. Inspired by the U-Net architecture, Y-Net predicts a spectrogram mask to separate vocal sources from a mixture signal. Our results demonstrate the effectiveness of the proposed architecture for music source separation with fewer parameters. Overall, our work presents a promising approach for improving the accuracy and efficiency of music source separation.