ASCVLGSDAug 11, 2020

Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on Spectrograms

arXiv:2008.04590v112 citations
AI Analysis

This work addresses the challenge of limited labeled data for audio classification in a specific domain, but it is incremental as it applies existing methods to a new dataset.

The paper tackled the problem of surgical mask detection from audio using convolutional neural networks on mel-spectrograms, showing that data augmentation techniques improved performance and outperformed most baselines from the ComParE Challenge 2020.

In many fields of research, labeled datasets are hard to acquire. This is where data augmentation promises to overcome the lack of training data in the context of neural network engineering and classification tasks. The idea here is to reduce model over-fitting to the feature distribution of a small under-descriptive training dataset. We try to evaluate such data augmentation techniques to gather insights in the performance boost they provide for several convolutional neural networks on mel-spectrogram representations of audio data. We show the impact of data augmentation on the binary classification task of surgical mask detection in samples of human voice (ComParE Challenge 2020). Also we consider four varying architectures to account for augmentation robustness. Results show that most of the baselines given by ComParE are outperformed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes