Daniel Mork

h-index6
2papers

2 Papers

LGSep 10, 2025
Signal Fidelity Index-Aware Calibration for Dementia Predictions Across Heterogeneous Real-World Data

Jingya Cheng, Jiazi Tian, Federica Spoto et al.

\textbf{Background:} Machine learning models trained on electronic health records (EHRs) often degrade across healthcare systems due to distributional shift. A fundamental but underexplored factor is diagnostic signal decay: variability in diagnostic quality and consistency across institutions, which affects the reliability of codes used for training and prediction. \textbf{Objective:} To develop a Signal Fidelity Index (SFI) quantifying diagnostic data quality at the patient level in dementia, and to test SFI-aware calibration for improving model performance across heterogeneous datasets without outcome labels. \textbf{Methods:} We built a simulation framework generating 2,500 synthetic datasets, each with 1,000 patients and realistic demographics, encounters, and coding patterns based on dementia risk factors. The SFI was derived from six interpretable components: diagnostic specificity, temporal consistency, entropy, contextual concordance, medication alignment, and trajectory stability. SFI-aware calibration applied a multiplicative adjustment, optimized across 50 simulation batches. \textbf{Results:} At the optimal parameter ($α$ = 2.0), SFI-aware calibration significantly improved all metrics (p $<$ 0.001). Gains ranged from 10.3\% for Balanced Accuracy to 32.5\% for Recall, with notable increases in Precision (31.9\%) and F1-score (26.1\%). Performance approached reference standards, with F1-score and Recall within 1\% and Balanced Accuracy and Detection Rate improved by 52.3\% and 41.1\%, respectively. \textbf{Conclusions:} Diagnostic signal decay is a tractable barrier to model generalization. SFI-aware calibration provides a practical, label-free strategy to enhance prediction across healthcare contexts, particularly for large-scale administrative datasets lacking outcome labels.

MESep 28, 2021
Heterogeneous Distributed Lag Models to Estimate Personalized Effects of Maternal Exposures to Air Pollution

Daniel Mork, Marianthi-Anna Kioumourtzoglou, Marc Weisskopf et al.

Children's health studies support an association between maternal environmental exposures and children's birth outcomes. A common goal is to identify critical windows of susceptibility--periods during gestation with increased association between maternal exposures and a future outcome. The timing of the critical windows and magnitude of the associations are likely heterogeneous across different levels of individual, family, and neighborhood characteristics. Using an administrative Colorado birth cohort we estimate the individualized relationship between weekly exposures to fine particulate matter (PM$_{2.5}$) during gestation and birth weight. To achieve this goal, we propose a statistical learning method combining distributed lag models and Bayesian additive regression trees to estimate critical windows at the individual level and identify characteristics that induce heterogeneity from a high-dimensional set of potential modifying factors. We find evidence of heterogeneity in the PM$_{2.5}$-birth weight relationship, with some mother-child dyads showing a 3 times larger decrease in birth weight for an IQR increase in exposure (5.9 to 8.5 $μg/m^3$ PM$_{2.5}$) compared to the population average. Specifically, we find increased susceptibility for non-Hispanic mothers who are either younger, have higher body mass index or lower educational attainment. Our case study is the first precision health study of critical windows.