PaCX-MAE: Physiology-Augmented Chest X-Ray Masked Autoencoder
For medical imaging practitioners, this work provides a method to leverage physiological data during pretraining without requiring it at test time, improving performance on physiology-dependent tasks.
PaCX-MAE injects physiological priors into chest X-ray encoders via cross-modal distillation, achieving consistent improvements over domain-specific MAE on nine benchmarks, including +2.7 AUROC on MedMod and +6.5 F1 on VinDr, while remaining unimodal at inference.
Clinical diagnosis often requires combining imaging with physiological measurements, yet deployed models typically operate on unimodal data. We present PaCX-MAE, a cross-modal distillation framework that injects physiological priors into chest X-ray (CXR) encoders while remaining strictly unimodal at inference. PaCX-MAE augments in-domain masked autoencoding with a dual contrastive-predictive objective, aligning CXR representations with paired ECG and laboratory embeddings. Extensive evaluation across nine benchmarks demonstrates consistent improvements over domain-specific MAE, particularly on physiology-dependent tasks (e.g., +2.7 AUROC on MedMod; +6.5 F1 on VinDr). The method proves highly label-efficient in the 1% regime and preserves anatomical fidelity, achieving parity with MAE on segmentation tasks. Zero-shot and attention analyses confirm that PaCX-MAE successfully learns to attend to physiological indicators, such as the cardiac silhouette, absent in standard visual pretraining.