wav2shape: Hearing the Shape of a Drum Machine
This work addresses a challenging problem in audio signal processing with applications in musical acoustics and structural engineering, but it is incremental as it builds on existing methods like scattering transforms and deep learning.
The paper tackles the inverse problem of recovering physical attributes like shape and material from audio waveforms, achieving this by combining time-frequency analysis with supervised learning to estimate drum resonator parameters from synthesized sounds.
Disentangling and recovering physical attributes, such as shape and material, from a few waveform examples is a challenging inverse problem in audio signal processing, with numerous applications in musical acoustics as well as structural engineering. We propose to address this problem via a combination of time--frequency analysis and supervised machine learning. We start by synthesizing a dataset of sounds using the functional transformation method. Then, we represent each percussive sound in terms of its time-invariant scattering transform coefficients and formulate the parametric estimation of the resonator as multidimensional regression with a deep convolutional neural network. We interpolate scattering coefficients over the surface of the drum as a surrogate for potentially missing data, and study the response of the neural network to interpolated samples. Lastly, we resynthesize drum sounds from scattering coefficients, therefore paving the way towards a deep generative model of drum sounds whose latent variables are physically interpretable.