SDCVASMar 21, 2025

Improving Acoustic Scene Classification with City Features

arXiv:2503.16862v21 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses acoustic scene classification for audio analysis applications, but it is incremental as it builds on existing knowledge distillation methods.

The paper tackled acoustic scene classification by leveraging city-specific features, which are often overlooked as noise, and found that distilling this knowledge improved accuracy across lightweight CNN models, achieving competitive performance with top solutions in the DCASE Challenge.

Acoustic scene recordings are often collected from a diverse range of cities. Most existing acoustic scene classification (ASC) approaches focus on identifying common acoustic scene patterns across cities to enhance generalization. However, the potential acoustic differences introduced by city-specific environmental and cultural factors are overlooked. In this paper, we hypothesize that the city-specific acoustic features are beneficial for the ASC task rather than being treated as noise or bias. To this end, we propose City2Scene, a novel framework that leverages city features to improve ASC. Unlike conventional approaches that may discard or suppress city information, City2Scene transfers the city-specific knowledge from pre-trained city classification models to scene classification model using knowledge distillation. We evaluate City2Scene on three datasets of DCASE Challenge Task 1, which include both scene and city labels. Experimental results demonstrate that city features provide valuable information for classifying scenes. By distilling city-specific knowledge, City2Scene effectively improves accuracy across a variety of lightweight CNN backbones, achieving competitive performance to the top-ranked solutions of DCASE Challenge in recent years.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes