ASCVSDJun 17, 2020

Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs

arXiv:2006.10147v242 citations
Originality Incremental advance
AI Analysis

This addresses mask detection from speech for applications like forensic investigations and healthcare, but it is incremental as it builds on existing GAN and augmentation techniques.

The paper tackles the problem of detecting whether a person is wearing a face mask from speech, proposing a data augmentation method using cycle-consistent GANs to translate utterances between masked and unmasked classes, which improved performance by 0.9% on a private test set and surpassed a baseline by 2.8% in a challenge.

The task of detecting whether a person wears a face mask from speech is useful in modelling speech in forensic investigations, communication between surgeons or people protecting themselves against infectious diseases such as COVID-19. In this paper, we propose a novel data augmentation approach for mask detection from speech. Our approach is based on (i) training Generative Adversarial Networks (GANs) with cycle-consistency loss to translate unpaired utterances between two classes (with mask and without mask), and on (ii) generating new training utterances using the cycle-consistent GANs, assigning opposite labels to each translated utterance. Original and translated utterances are converted into spectrograms which are provided as input to a set of ResNet neural networks with various depths. The networks are combined into an ensemble through a Support Vector Machines (SVM) classifier. With this system, we participated in the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 Computational Paralinguistics Challenge, surpassing the baseline proposed by the organizers by 2.8%. Our data augmentation technique provided a performance boost of 0.9% on the private test set. Furthermore, we show that our data augmentation approach yields better results than other baseline and state-of-the-art augmentation methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes