SDLGASNov 12, 2021

Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization

arXiv:2111.06531v118 citations
Originality Incremental advance
AI Analysis

This addresses the problem of device variability in audio classification for practical applications, with incremental improvements in efficiency.

The paper tackles acoustic scene classification with multi-device audio inputs by proposing Residual Normalization and an efficient BC-ResNet-ASC architecture, achieving 76.3% accuracy with 315k parameters and 75.3% after compression to 61.0KB, winning first place in the DCASE 2021 challenge.

It is a practical research topic how to deal with multi-device audio inputs by a single acoustic scene classification system with efficient design. In this work, we propose Residual Normalization, a novel feature normalization method that uses frequency-wise normalization % instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful information for classification. Moreover, we introduce an efficient architecture, BC-ResNet-ASC, a modified version of the baseline architecture with a limited receptive field. BC-ResNet-ASC outperforms the baseline architecture even though it contains the small number of parameters. Through three model compression schemes: pruning, quantization, and knowledge distillation, we can reduce model complexity further while mitigating the performance degradation. The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters. The proposed method won the 1st place in DCASE 2021 challenge, TASK1A.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes