Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport
This addresses a specific bottleneck in neural audio synthesis for researchers and developers working on unsupervised parameter estimation.
The paper tackles the challenge of jointly training pitch estimators and synthesizers in neural audio processing by proposing a spectral loss function based on optimal transport theory that minimizes spectral energy displacement. They demonstrate this approach through an unsupervised autoencoding task that fits harmonic templates to signals, achieving joint estimation of fundamental frequency and harmonic amplitudes.
In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy. We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals. We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer. The proposed approach offers a promising direction for improving unsupervised parameter estimation in neural audio applications.