Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds
This research provides a more robust and accurate audio classification system for urban monitoring and cultural soundscape analysis, particularly for regions with complex, overlapping sounds like South Asia.
This study addresses the challenge of classifying complex, overlapping environmental sounds in South Asia by introducing a novel spectrogram-based methodology. It uses a Convolutional Neural Network (CNN) to perform multilabel, multiclass classification on the SAS-KIIT dataset and validates its robustness on the UrbanSound8K dataset, demonstrating superior classification accuracy compared to existing MFCC-based techniques.
Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.