CURE Dataset: Ladder Networks for Audio Event Classification
This work addresses audio event classification for people with hearing loss, but it is incremental as it builds on existing methods like CNNs and ladder networks.
The paper tackled audio event classification for people with hearing loss by establishing the CURE dataset and proposing a ladder network classifier, which achieved increased classification accuracy and robustness compared to SVM and ELM baselines.
Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 3M people with hearing loss who can't perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. We propose a ladder network based audio event classifier that utilizes 5s sound recordings derived from the Freesound project. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We also investigate extreme learning machine (ELM) for event classification. In this study, proposed classifiers are compared with support vector machine (SVM) baseline. We propose signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Firstly, CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We incorporate ESC-50 dataset as second evaluation set. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy. While Ladder network is robust to data mismatches, simpler SVM and ELM classifiers are sensitive to such mismatches, where the proposed normalization techniques can play an important role. Experimental studies with ESC-50 and CURE corpora elucidate the differences in dataset complexity and robustness offered by proposed approaches.