SDLGASSep 18, 2024

Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction

arXiv:2409.11964v13 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses the problem of acoustic scene classification with limited data for researchers and practitioners in audio processing, but it is incremental as it builds on existing methods like FocusNet and PaSST.

The paper tackles data-efficient acoustic scene classification by introducing three systems for different training split sizes, achieving the highest average testing accuracies ranging from 62.21% to 47.97% across splits from 100% to 5% on the TAU Urban Acoustic Scene 2022 Mobile dataset.

In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels. We introduce data augmentation in the form of mixup to increase the diversity of training samples. For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz. We use Knowledge Distillation to distill the ensemble model to the baseline student model. Training the systems on the TAU Urban Acoustic Scene 2022 Mobile development dataset yielded the highest average testing accuracy of (62.21, 59.82, 56.81, 53.03, 47.97)% on split (100, 50, 25, 10, 5)% respectively over the three systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes