ASLGSDAug 12, 2020

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

arXiv:2008.05175v2
AI Analysis

This work addresses health monitoring challenges in speech processing, but it is incremental as it builds on existing methods for a specific challenge.

The paper tackled mask detection and breath monitoring from speech by training deep convolutional neural networks with various features and data augmentation, achieving 0.746 PCC for breath monitoring and 78.8% UAR for mask detection.

This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020. For the mask detection task, we train deep convolutional neural networks with filter-bank energies, gender-aware features, and speaker-aware features. Support Vector Machines follows as the back-end classifiers for binary prediction on the extracted deep embeddings. Several data augmentation schemes are used to increase the quantity of training data and improve our models' robustness, including speed perturbation, SpecAugment, and random erasing. For the speech breath monitoring task, we investigate different bottleneck features based on the Bi-LSTM structure. Experimental results show that our proposed methods outperform the baselines and achieve 0.746 PCC and 78.8% UAR on the Breathing and Mask evaluation set, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes