SP LGJan 29, 2025

On the Bias, Fairness, and Bias Mitigation for a Wearable-based Freezing of Gait Detection in Parkinson's Disease

Timothy Odonga, Christine D. Esper, Stewart A. Factor, J. Lucas McKay, Hyeokhyen Kwon

arXiv:2502.09626v13.33 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

This addresses fairness issues in health analytics for Parkinson's patients, but it is incremental as it applies existing bias metrics and transfer learning to a specific domain.

The study tackled bias and fairness in wearable-based freezing of gait detection for Parkinson's disease by evaluating models across demographics and conditions, finding bias in all variables and showing that transfer learning improved fairness (e.g., average DPR change +0.027) and performance (e.g., average F1-score change +0.026).

Freezing of gait (FOG) is a debilitating feature of Parkinson's disease (PD), which is a cause of injurious falls among PD patients. Recent advances in wearable-based human activity recognition (HAR) technology have enabled the detection of FOG subtypes across benchmark datasets. Since FOG manifestation is heterogeneous, developing models that quantify FOG consistently across patients with varying demographics, FOG types, and PD conditions is important. Bias and fairness in FOG models remain understudied in HAR, with research focused mainly on FOG detection using single benchmark datasets. We evaluated the bias and fairness of HAR models for wearable-based FOG detection across demographics and PD conditions using multiple datasets and the effectiveness of transfer learning as a potential bias mitigation approach. Our evaluation using demographic parity ratio (DPR) and equalized odds ratio (EOR) showed model bias (DPR & EOR < 0.8) for all stratified demographic variables, including age, sex, and disease duration. Our experiments demonstrated that transfer learning from multi-site datasets and generic human activity representations significantly improved fairness (average change in DPR +0.027, +0.039, respectively) and performance (average change in F1-score +0.026, +0.018, respectively) across attributes, supporting the hypothesis that generic human activity representations learn fairer representations applicable to health analytics.

View on arXiv PDF

Similar