SDLGASJan 21, 2021

LEAF: A Learnable Frontend for Audio Classification

arXiv:2101.08596v1184 citations
Originality Highly original
AI Analysis

This provides a general-purpose, efficient learned frontend for audio classification, addressing a fundamental limitation in audio processing for researchers and practitioners.

The authors tackled the problem of fixed, engineered audio features like mel-filterbanks by developing LEAF, a learnable frontend that outperforms mel-filterbanks across diverse audio signals including speech, music, and animal sounds, achieving consistent improvements and state-of-the-art results on Audioset with far fewer parameters.

Mel-filterbanks are fixed, engineered audio features which emulate human perception and have been used through the history of audio understanding up to today. However, their undeniable qualities are counterbalanced by the fundamental limitations of handmade representations. In this work we show that we can train a single learnable frontend that outperforms mel-filterbanks on a wide range of audio signals, including speech, music, audio events and animal sounds, providing a general-purpose learned frontend for audio classification. To do so, we introduce a new principled, lightweight, fully learnable architecture that can be used as a drop-in replacement of mel-filterbanks. Our system learns all operations of audio features extraction, from filtering to pooling, compression and normalization, and can be integrated into any neural network at a negligible parameter cost. We perform multi-task training on eight diverse audio classification tasks, and show consistent improvements of our model over mel-filterbanks and previous learnable alternatives. Moreover, our system outperforms the current state-of-the-art learnable frontend on Audioset, with orders of magnitude fewer parameters.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes