SDMay 6, 2017

End-to-end Source Separation with Adaptive Front-Ends

Shrikant Venkataramani, Jonah Casebeer, Paris Smaragdis

arXiv:1705.02514v221.977 citations

Originality Highly original

AI Analysis

This addresses a bottleneck in end-to-end learning for audio applications, offering an adaptive front-end that improves source separation for audio processing tasks.

The paper tackled the lack of neural network equivalents for short-time Fourier transforms in audio source separation by introducing an auto-encoder that learns optimal basis functions from raw waveforms, resulting in significantly better separation performance compared to Fourier methods.

Source separation and other audio applications have traditionally relied on the use of short-time Fourier transforms as a front-end frequency domain representation step. The unavailability of a neural network equivalent to forward and inverse transforms hinders the implementation of end-to-end learning systems for these applications. We present an auto-encoder neural network that can act as an equivalent to short-time front-end transforms. We demonstrate the ability of the network to learn optimal, real-valued basis functions directly from the raw waveform of a signal and further show how it can be used as an adaptive front-end for supervised source separation. In terms of separation performance, these transforms significantly outperform their Fourier counterparts. Finally, we also propose a novel source to distortion ratio based cost function for end-to-end source separation.

View on arXiv PDF

Similar