SDASSep 24, 2020

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

arXiv:2009.11644v1297 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a large-scale, expert-labeled dataset for researchers developing cough analysis algorithms, addressing an urgent need in global health crises like COVID-19, but it is incremental as it builds on existing data collection efforts.

The authors tackled the lack of a validated database for training machine learning models in cough audio classification, particularly for COVID-19 screening, by creating the COUGHVID dataset with over 20,000 crowdsourced cough recordings and expert labels for more than 2,000 recordings.

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates, and that their expert labels are consistent. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes