Deep Impulse Responses: Estimating and Parameterizing Filters with Deep Networks
This addresses the challenge of impulse response estimation for audio processing in augmented and virtual reality, though it appears incremental as it builds on existing neural representation learning techniques.
The authors tackled the problem of estimating impulse responses in high-noise, uncontrolled environments by proposing a neural network framework that jointly estimates impulse responses and spectral noise characteristics. They demonstrated robustness at low signal-to-noise ratios and strong results on real-world speech data, enabling interpolation and compression for AR/VR applications.
Impulse response estimation in high noise and in-the-wild settings, with minimal control of the underlying data distributions, is a challenging problem. We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning. Our framework is driven by a carefully designed neural network that jointly estimates the impulse response and the (apriori unknown) spectral noise characteristics of an observed signal given the source signal. We demonstrate robustness in estimation, even under low signal-to-noise ratios, and show strong results when learning from spatio-temporal real-world speech data. Our framework provides a natural way to interpolate impulse responses on a spatial grid, while also allowing for efficiently compressing and storing them for real-time rendering applications in augmented and virtual reality.