SDAIASAug 24, 2024

Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification

arXiv:2408.13644v1h-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses the complex problem of environmental sound recognition for audio analysis applications, but it is incremental as it builds on existing CNN models and introduces minor methodological tweaks.

The paper tackles environmental sound classification by proposing a two-level classification method and analyzing audio filters, achieving a maximum accuracy of 78.75% for Level 1 and 98.04% for Level 2 on the ESC-50 dataset.

Environmental Sound Classification is an important problem of sound recognition and is more complicated than speech recognition problems as environmental sounds are not well structured with respect to time and frequency. Researchers have used various CNN models to learn audio features from different audio features like log mel spectrograms, gammatone spectral coefficients, mel-frequency spectral coefficients, generated from the audio files, over the past years. In this paper, we propose a new methodology : Two-Level Classification; the Level 1 Classifier will be responsible to classify the audio signal into a broader class and the Level 2 Classifiers will be responsible to find the actual class to which the audio belongs, based on the output of the Level 1 Classifier. We have also shown the effects of different audio filters, among which a new method of Audio Crop is introduced in this paper, which gave the highest accuracies in most of the cases. We have used the ESC-50 dataset for our experiment and obtained a maximum accuracy of 78.75% in case of Level 1 Classification and 98.04% in case of Level 2 Classifications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes