SDMMASNov 11, 2018

Multi-Temporal Resolution Convolutional Neural Networks for Acoustic Scene Classification

arXiv:1811.04419v114 citations
Originality Incremental advance
AI Analysis

This work addresses acoustic scene classification for audio processing applications, presenting an incremental improvement over existing methods.

The paper tackles acoustic scene classification by proposing a multi-temporal resolution CNN architecture that processes Mel-Spectrogram segments at varying resolutions to capture spectral texture and acoustic event distributions, achieving a 3.56% absolute improvement over the best single-resolution model and 12.49% over the DCASE 2017 baseline.

In this paper we present a Deep Neural Network architecture for the task of acoustic scene classification which harnesses information from increasing temporal resolutions of Mel-Spectrogram segments. This architecture is composed of separated parallel Convolutional Neural Networks which learn spectral and temporal representations for each input resolution. The resolutions are chosen to cover fine-grained characteristics of a scene's spectral texture as well as its distribution of acoustic events. The proposed model shows a 3.56% absolute improvement of the best performing single resolution model and 12.49% of the DCASE 2017 Acoustic Scenes Classification task baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes