LGJul 4, 2019Code
Adversarial Attacks in Sound Event ClassificationVinod Subramanian, Emmanouil Benetos, Ning Xu et al.
Adversarial attacks refer to a set of methods that perturb the input to a classification model in order to fool the classifier. In this paper we apply different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification. Four of the models use mel-spectrogram input and one model uses raw audio input. The models represent standard architectures such as convolutional, recurrent and dense networks. The dataset used for training is the Freesound dataset released for task 2 of the DCASE 2018 challenge and the models used are from participants of the challenge who open sourced their code. Our experiments show that adversarial attacks can be generated with high confidence and low perturbation. In addition, we show that the adversarial attacks are very effective across the different models.
SDOct 18, 2021
Learning Models for Query by Vocal Percussion: A Comparative StudyAlejandro Delgado, SkoT McDonald, Ning Xu et al.
The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Thus, the automatic retrieval of drum sounds using vocal percussion can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. Here we explore different strategies to perform this type of query, making use of both traditional machine learning algorithms and recent deep learning techniques. The main hyperparameters from the models involved are carefully selected by feeding performance metrics to a grid search algorithm. We also look into several audio data augmentation techniques, which can potentially regularise deep learning models and improve generalisation. We compare the final performances in terms of effectiveness (classification accuracy), efficiency (computational speed), stability (performance consistency), and interpretability (decision patterns), and discuss the relevance of these results when it comes to the design of successful query-by-vocal-percussion systems.