SDMay 16

Speaker-Disentangled Remote Speech Detection of Asthma and COPD Exacerbations

Yuyang Yan, Sami O. Simons, Visara Urovi

arXiv:2605.1687814.8

AI Analysis

For clinicians and patients, this work improves remote monitoring of respiratory exacerbations while enhancing privacy and interpretability, though it is incremental over existing adversarial disentanglement approaches.

The paper proposes an adversarial learning framework to disentangle pathology-related acoustic patterns from speaker-identifiable attributes for remote detection of asthma and COPD exacerbations from speech. On the TACTICAS dataset, the method improves AUC from 0.897 to 0.910 for respiratory status classification and from 0.674 to 0.793 for exacerbation type classification, while suppressing speaker information.

Early detection of exacerbations in asthma and chronic obstructive pulmonary disease (COPD) is important for timely intervention. Speech has emerged as a promising tool for continuous, non-invasive respiratory disease monitoring. However, speech signals inherently carry speaker-identifiable attributes that may dominate model predictions, which may compromise both diagnosis performance and patient privacy. Furthermore, the acoustic features associated with respiratory disease and speaker identity remain unclear in respiratory disease monitoring. We propose an adversarial learning architecture that disentangles pathology-related acoustic patterns from speaker-identifiable attributes. The framework optimizes two clinically hierarchical tasks: (i) respiratory status classification (stable vs. exacerbated) and (ii) exacerbation type classification (asthma exacerbation vs. COPD exacerbation). Speaker identity is suppressed through gradient reversal-based adversarial training. To enhance clinical interpretability, we employ SHapley Additive exPlanations (SHAP) to quantify the contributions of acoustic features to pathology-related predictions versus speaker identity. On the TACTICAS dataset, our method outperforms the single-task baseline across both tasks. For the respiratory status task (stable vs. exacerbated), the AUC improves from 0.897 to 0.910. For the exacerbation type task (asthma exacerbation vs. COPD exacerbation), the AUC increases from 0.674 to 0.793. Concurrently, the J-ratio decreases, confirming effective suppression of speaker information. SHAP analysis reveals the contributions of the acoustic features to both tasks. External validation on the Bridge2AI-Voice dataset further demonstrates consistent performance improvement and reduced speaker dependency, confirming cross-dataset generalizability.

View on arXiv PDF

Similar