LGSDASSep 11, 2023

Instabilities in Convnets for Raw Audio

arXiv:2309.05855v43 citationsh-index: 11
AI Analysis

This addresses a key instability problem in audio signal processing for researchers and practitioners using deep learning, though it is incremental as it builds on existing initialization theories.

The paper investigates why convolutional neural networks (convnets) for raw audio often underperform hand-crafted filterbanks, attributing it to initialization instabilities in large filters and periodic signals, with numerical simulations showing a logarithmic scaling law for condition numbers.

What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. These baselines are linear time-invariant systems: as such, they can be approximated by convnets with wide receptive fields. Yet, in practice, gradient-based optimization leads to suboptimal approximations. In our article, we approach this phenomenon from the perspective of initialization. We present a theory of large deviations for the energy response of FIR filterbanks with random Gaussian weights. We find that deviations worsen for large filters and locally periodic input signals, which are both typical for audio signal processing applications. Numerical simulations align with our theory and suggest that the condition number of a convolutional layer follows a logarithmic scaling law between the number and length of the filters, which is reminiscent of discrete wavelet bases.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes