LG MLMay 18

Uncertainty Reliability Under Domain Shift: An Investigation for Data-Driven Blood Pressure Estimation in Photoplethysmography

Mohammad Moulaeifard, Ciaran Bench, Philip J. Aston, Nils Strodthoff

arXiv:2605.1800826.0

AI Analysis

For researchers developing trustworthy cuffless blood pressure estimation systems, this work highlights the need to jointly assess predictive accuracy and calibration on external data, providing practical guidance on method selection.

This study evaluated uncertainty quantification methods for deep learning-based blood pressure estimation from PPG signals under domain shift, finding that deep ensembles provide stronger predictive robustness than Monte Carlo dropout, and that recalibration via conformal prediction or temperature scaling is essential for reliable uncertainty, especially when using MSE loss.

Uncertainty quantification (UQ) is critical for safety-critical domains like healthcare, yet it is rarely evaluated under realistic out-of-distribution (OOD) conditions. Here, we assessed predictive performance and uncertainty reliability for deep learning-based blood pressure (BP) estimation from photoplethysmography (PPG) signals under both in-distribution (ID) and OOD settings. Using an XResNet1D-50 trained on PulseDB and tested on four external datasets, we compared deep ensembles (DE) and Monte Carlo dropout (MCD) with Gaussian negative log-likelihood (GNLL) and mean squared error (MSE) losses, optionally followed by post-hoc recalibration via conformal prediction (CP), temperature scaling (TS), and isotonic regression (IR). The key findings of our study are as follows: (1) DE provides stronger predictive robustness under domain shift than MCD, an advantage that becomes clear primarily under external shift. (2) Recalibrated GNLL-based methods yield the best uncertainty calibration (e.g., GNLL+DE+CP for systolic blood pressure (SBP), GNLL+DE+TS for diastolic blood pressure (DBP)), while MSE-based uncertainty requires recalibration to become practically useful. (3) Across settings, CP and TS offer the most consistent gains, with IR remaining competitive in several cases. Overall, our results identify DE-based methods as most robust for predictive performance under domain shift, GNLL as strongest for native UQ, and recalibration as essential for making MSE-based uncertainty practical. These findings highlight the need to jointly assess predictive accuracy and calibration on external data for trustworthy cuffless BP estimation

View on arXiv PDF

Similar