Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
For pediatric sleep analysis, this paper demonstrates that geometric, topological, and clinical features provide complementary, interpretable signals beyond raw embeddings, improving calibration and robustness under extreme class imbalance.
This work investigates latent structure in multimodal pediatric sleep embeddings by augmenting them with PHATE-derived coordinates, topological summaries, and EHR data. Using simple linear/MLP models, they show complementary gains across four binary tasks, with AUPRC improvements (e.g., 0.26 to 0.34 for desaturation) and best calibration from full fusion.
While generative models have shown promise in pediatric sleep analysis, the latent structure of their multimodal embeddings remains poorly understood. This work investigates session-wide diagnostic information contained in the sequences of 30-second pediatric PSG epochs embedded by a multimodal masked autoencoder. We test whether augmenting embeddings with PHATE-derived per-epoch coordinates and whole-night movement descriptors, persistent homology summaries of the embedding cloud, and EHR yields task-relevant signals. Simple linear and MLP models, chosen for interpretability rather than state-of-the-art performance, show that geometric, topological, and clinical features each provide complementary gains. For binary predictions, feature importance is task-dependent, and more expressive late-fusion models generally perform better, with AUPRC improving from 0.26 to 0.34 for desaturation, 0.31 to 0.48 for EEG arousal, 0.09 to 0.22 for hypopnea, and 0.05 to 0.14 for apnea. We also report Brier score and Expected Calibration Error, where the full fusion model yields the best calibration across all four binary tasks. Our study reveals that latent geometry/topology and EHR offer complementary, interpretable signals beyond embeddings, improving calibration and robustness under extreme imbalance.