WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
This work addresses the need for a generalizable and privacy-preserving speech health diagnostic model for remote health monitoring, representing a novel method rather than an incremental improvement.
The authors tackled the problem of disease-specific and non-generalizable speech health diagnostic models with privacy concerns by proposing WavRx, which achieved state-of-the-art performance on six pathological speech datasets and significantly reduced speaker identity leakage without extra training guidance.
Speech is known to carry health-related attributes, which has emerged as a novel venue for remote and long-term health monitoring. However, existing models are usually tailored for a specific type of disease, and have been shown to lack generalizability across datasets. Furthermore, concerns have been raised recently towards the leakage of speaker identity from health embeddings. To mitigate these limitations, we propose WavRx, a speech health diagnostics model that captures the respiration and articulation related dynamics from a universal speech representation. Our in-domain and cross-domain experiments on six pathological speech datasets demonstrate WavRx as a new state-of-the-art health diagnostic model. Furthermore, we show that the amount of speaker identity entailed in the WavRx health embeddings is significantly reduced without extra guidance during training. An in-depth analysis of the model was performed, thus providing physiological interpretation of its improved generalizability and privacy-preserving ability.