Is Clinical Text Enough? A Multimodal Study on Mortality Prediction in Heart Failure Patients
This work addresses mortality prediction for heart failure patients, offering an incremental improvement by enhancing multimodal methods over existing approaches.
The study tackled short-term mortality prediction in heart failure patients by comparing transformer-based models using text-only, structured-only, multimodal, and LLM-based approaches, finding that supervised multimodal fusion of text and structured variables achieved the best performance, while LLMs performed inconsistently.
Accurate short-term mortality prediction in heart failure (HF) remains challenging, particularly when relying on structured electronic health record (EHR) data alone. We evaluate transformer-based models on a French HF cohort, comparing text-only, structured-only, multimodal, and LLM-based approaches. Our results show that enriching clinical text with entity-level representations improves prediction over CLS embeddings alone, and that supervised multimodal fusion of text and structured variables achieves the best overall performance. In contrast, large language models perform inconsistently across modalities and decoding strategies, with text-only prompts outperforming structured or multimodal inputs. These findings highlight that entity-aware multimodal transformers offer the most reliable solution for short-term HF outcome prediction, while current LLM prompting remains limited for clinical decision support.