SDLGASSPMLJan 6, 2019

Enhancing Sound Texture in CNN-Based Acoustic Scene Classification

arXiv:1901.01502v140 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving acoustic scene classification for audio processing applications, but it is incremental as it builds on existing CNN methods with feature enhancement techniques.

The study tackled the problem of understanding how acoustic scenes are perceived in CNN models by using Class Activation Mapping to analyze log-Mel features, finding that sound texture is well-learned while distinct high-energy components are not. To enhance this, they applied Difference of Gaussian and Sobel operators to log-Mel features, resulting in significant performance improvements in audio scene classification on the DCASE 2017 ASC challenge.

Acoustic scene classification is the task of identifying the scene from which the audio signal is recorded. Convolutional neural network (CNN) models are widely adopted with proven successes in acoustic scene classification. However, there is little insight on how an audio scene is perceived in CNN, as what have been demonstrated in image recognition research. In the present study, the Class Activation Mapping (CAM) is utilized to analyze how the log-magnitude Mel-scale filter-bank (log-Mel) features of different acoustic scenes are learned in a CNN classifier. It is noted that distinct high-energy time-frequency components of audio signals generally do not correspond to strong activation on CAM, while the background sound texture are well learned in CNN. In order to make the sound texture more salient, we propose to apply the Difference of Gaussian (DoG) and Sobel operator to process the log-Mel features and enhance edge information of the time-frequency image. Experimental results on the DCASE 2017 ASC challenge show that using edge enhanced log-Mel images as input feature of CNN significantly improves the performance of audio scene classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes