Hongtao Hao

LG
h-index22
3papers
2citations
Novelty53%
AI Score43

3 Papers

LGDec 3, 2025
Bayesian Event-Based Model for Disease Subtype and Stage Inference

Hongtao Hao, Joseph L. Austerweil

Chronic diseases often progress differently across patients. Rather than randomly varying, there are typically a small number of subtypes for how a disease progresses across patients. To capture this structured heterogeneity, the Subtype and Stage Inference Event-Based Model (SuStaIn) estimates the number of subtypes, the order of disease progression for each subtype, and assigns each patient to a subtype from primarily cross-sectional data. It has been widely applied to uncover the subtypes of many diseases and inform our understanding of them. But how robust is its performance? In this paper, we develop a principled Bayesian subtype variant of the event-based model (BEBMS) and compare its performance to SuStaIn in a variety of synthetic data experiments with varied levels of model misspecification. BEBMS substantially outperforms SuStaIn across ordering, staging, and subtype assignment tasks. Further, we apply BEBMS and SuStaIn to a real-world Alzheimer's data set. We find BEBMS has results that are more consistent with the scientific consensus of Alzheimer's disease progression than SuStaIn.

LGDec 3, 2025
Joint Progression Modeling (JPM): A Probabilistic Framework for Mixed-Pathology Progression

Hongtao Hao, Joseph L. Austerweil

Event-based models (EBMs) infer disease progression from cross-sectional data, and standard EBMs assume a single underlying disease per individual. In contrast, mixed pathologies are common in neurodegeneration. We introduce the Joint Progression Model (JPM), a probabilistic framework that treats single-disease trajectories as partial rankings and builds a prior over joint progressions. We study several JPM variants (Pairwise, Bradley-Terry, Plackett-Luce, and Mallows) and analyze three properties: (i) calibration -- whether lower model energy predicts smaller distance to the ground truth ordering; (ii) separation -- the degree to which sampled rankings are distinguishable from random permutations; and (iii) sharpness -- the stability of sampled aggregate rankings. All variants are calibrated, and all achieve near-perfect separation; sharpness varies by variant and is well-predicted by simple features of the input partial rankings (number and length of rankings, conflict, and overlap). In synthetic experiments, JPM improves ordering accuracy by roughly 21 percent over a strong EBM baseline (SA-EBM) that treats the joint disease as a single condition. Finally, using NACC, we find that the Mallows variant of JPM and the baseline model (SA-EBM) have results that are more consistent with prior literature on the possible disease progression of the mixed pathology of AD and VaD.

16.3LGApr 25
TEMPO: Transformers for Temporal Disease Progression from Cross-Sectional Data

Hongtao Hao, Joseph L. Austerweil

Event-Based Models (EBMs) infer biomarker progression from cross-sectional data but typically only as ordinal sequences and rely on rigid model assumptions. We propose \textsc{Tempo}, a Transformer architecture that learns both ordinal and continuous event sequences through simulation-based supervised learning. \textsc{Tempo} uses two Transformer modules: one treats biomarkers as tokens to infer event sequencing; the other treats patients as tokens, representing each by their per-biomarker abnormality profile, to infer patients' disease stages. On synthetic benchmarks, \textsc{Tempo} reduces normalized Kendall's Tau distance by 52.89\% and staging MAE by 25.33\% compared to state-of-the-art SA-EBM, with larger reductions in high-dimensional settings (58.88\% and 61.10\%). Applied to ADNI, \textsc{Tempo} recovers a biologically plausible Alzheimer's progression: early medial temporal atrophy, followed by amyloid accumulation and cognitive decline, and late-stage tau pathology with terminal acceleration of global neurodegeneration -- broadly consistent with established disease models. \textsc{Tempo} also eliminates the need to derive custom inference algorithms and enables rapid empirical comparison of generative hypotheses.