SDMay 6, 2017

End-to-end Source Separation with Adaptive Front-Ends

arXiv:1705.02514v277 citations
Originality Highly original
AI Analysis

This addresses a bottleneck in end-to-end learning for audio applications, offering an adaptive front-end that improves source separation for audio processing tasks.

The paper tackled the lack of neural network equivalents for short-time Fourier transforms in audio source separation by introducing an auto-encoder that learns optimal basis functions from raw waveforms, resulting in significantly better separation performance compared to Fourier methods.

Source separation and other audio applications have traditionally relied on the use of short-time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the implementation of end-to-end learning systems for these applications. We present an auto-encoder neural network that can act as an equivalent to short-time front-end transforms. We demonstrate the ability of the network to learn optimal, real-valued basis functions directly from the raw waveform of a signal and further show how it can be used as an adaptive front-end for supervised source separation. In terms of separation performance, these transforms significantly outperform their Fourier counterparts. Finally, we also propose a novel source to distortion ratio based cost function for end-to-end source separation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes