SD AI ASMay 22, 2023

LEAN: Light and Efficient Audio Classification Network

Shwetank Choudhary, CR Karthik, Punuru Sri Lakshmi, Sumit Kumar

arXiv:2305.12712v15.89 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of deploying audio classification models on mobile and edge devices by offering a more efficient solution, though it is incremental as it builds on existing methods like YAMNet.

The paper tackles audio classification for resource-constrained devices by proposing LEAN, a lightweight model that combines a trainable wave encoder, pretrained YAMNet, and cross attention-based temporal realignment, achieving a mean average precision of 0.445 with a 4.5MB memory footprint on the FSD50K dataset, which is a 22% improvement over the baseline.

Over the past few years, audio classification task on large-scale dataset such as AudioSet has been an important research area. Several deeper Convolution-based Neural networks have shown compelling performance notably Vggish, YAMNet, and Pretrained Audio Neural Network (PANN). These models are available as pretrained architecture for transfer learning as well as specific audio task adoption. In this paper, we propose a lightweight on-device deep learning-based model for audio classification, LEAN. LEAN consists of a raw waveform-based temporal feature extractor called as Wave Encoder and logmel-based Pretrained YAMNet. We show that using a combination of trainable wave encoder, Pretrained YAMNet along with cross attention-based temporal realignment, results in competitive performance on downstream audio classification tasks with lesser memory footprints and hence making it suitable for resource constraints devices such as mobile, edge devices, etc . Our proposed system achieves on-device mean average precision(mAP) of .445 with a memory footprint of a mere 4.5MB on the FSD50K dataset which is an improvement of 22% over baseline on-device mAP on same dataset.

View on arXiv PDF

Similar