EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use
This is an incremental improvement for audio classification researchers, as it reduces computational cost without solving the core problem of learnable frontends.
The paper tackled the computational expense of LEAF, a learnable audio frontend, by proposing EfficientLEAF with modified convolution kernels and operations, achieving similar accuracy at 3% of the cost, but both methods failed to consistently outperform a fixed mel filterbank in audio classification tasks.
In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN with better parallelizable operations, we can reach similar results more efficiently. In experiments on six audio classification tasks, our frontend matches the accuracy of LEAF at 3% of the cost, but both fail to consistently outperform a fixed mel filterbank. The quest for learnable audio frontends is not solved.