Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation
This addresses the challenge of asynchronicity in clinical multimodal fusion for healthcare prediction tasks, representing an incremental improvement over existing methods.
The paper tackles the problem of asynchronous multimodal clinical data, where chest X-ray images are outdated compared to electronic health records, by proposing DDL-CXR to generate up-to-date latent representations of individualized CXR images, improving prediction performance as shown in experiments on MIMIC datasets.
Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.