SDLGASJun 13, 2022

Low-complexity deep learning frameworks for acoustic scene classification

arXiv:2206.06057v13 citationsh-index: 18
Originality Synthesis-oriented
AI Analysis

This work addresses acoustic scene classification for applications like audio analysis, but it is incremental as it combines existing methods like spectrogram extraction, data augmentation, and late fusion.

The authors tackled acoustic scene classification by developing low-complexity deep learning frameworks, achieving a classification accuracy of 60.1% on the DCASE 2022 Task 1 Development dataset, which improved the baseline by 17.2%.

In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied to generate augmented spectrograms before being fed into deep learning based classifiers. Finally, to achieve the best performance, we fuse probabilities which obtained from three individual classifiers, which are independently-trained with three type of spectrograms. Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes