SDLGASNov 18, 2025

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report

arXiv:2511.14939v1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of developing clinically robust audio-based COVID-19 detection systems for healthcare applications, but it is incremental as it applies existing methods to new data with demographic controls.

This paper tackled COVID-19 detection using audio models by fine-tuning pre-trained architectures like Audio-MAE and PANN on datasets such as Coswara and COUGHVID, with results showing moderate intra-dataset performance (e.g., Audio-MAE achieved 0.82 AUC on Coswara) but severe generalization failure in cross-dataset evaluations (AUC 0.43-0.68).

This technical report investigates the performance of pre-trained audio models on COVID-19 detection tasks using established benchmark datasets. We fine-tuned Audio-MAE and three PANN architectures (CNN6, CNN10, CNN14) on the Coswara and COUGHVID datasets, evaluating both intra-dataset and cross-dataset generalization. We implemented a strict demographic stratification by age and gender to prevent models from exploiting spurious correlations between demographic characteristics and COVID-19 status. Intra-dataset results showed moderate performance, with Audio-MAE achieving the strongest result on Coswara (0.82 AUC, 0.76 F1-score), while all models demonstrated limited performance on Coughvid (AUC 0.58-0.63). Cross-dataset evaluation revealed severe generalization failure across all models (AUC 0.43-0.68), with Audio-MAE showing strong performance degradation (F1-score 0.00-0.08). Our experiments demonstrate that demographic balancing, while reducing apparent model performance, provides more realistic assessment of COVID-19 detection capabilities by eliminating demographic leakage - a confounding factor that inflate performance metrics. Additionally, the limited dataset sizes after balancing (1,219-2,160 samples) proved insufficient for deep learning models that typically require substantially larger training sets. These findings highlight fundamental challenges in developing generalizable audio-based COVID-19 detection systems and underscore the importance of rigorous demographic controls for clinically robust model evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes