Differentiable Signal Processing With Black-Box Audio Effects
This addresses the problem of automating audio production tasks for audio engineers and creators, offering a novel but incremental approach by combining existing methods.
The paper tackles automating audio signal processing by integrating non-differentiable black-box audio effects into deep neural networks, enabling tasks like tube amplifier emulation and music mastering with results comparable to a state-of-the-art commercial solution.
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network. We then train a deep encoder to analyze input audio and control effect parameters to perform the desired signal manipulation, requiring only input-target paired audio data as supervision. To train our network with non-differentiable black-box effects layers, we use a fast, parallel stochastic gradient approximation scheme within a standard auto differentiation graph, yielding efficient end-to-end backpropagation. We demonstrate the power of our approach with three separate automatic audio production applications: tube amplifier emulation, automatic removal of breaths and pops from voice recordings, and automatic music mastering. We validate our results with a subjective listening test, showing our approach not only can enable new automatic audio effects tasks, but can yield results comparable to a specialized, state-of-the-art commercial solution for music mastering.