LG AI MAAug 7, 2025

MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

Jifan Gao, Mahmudur Rahman, John Caskey, Madeline Oguss, Ann O'Rourke, Randy Brown, Anne Stey, Anoop Mayampurath, Matthew M. Churpek, Guanhua Chen, Majid Afshar

arXiv:2508.05492v14 citationsh-index: 27npj Digital Medicine

Originality Incremental advance

AI Analysis

This addresses the problem of effectively using diverse EHR data for clinical predictions, offering a flexible solution that improves accuracy, though it appears incremental as it builds on existing LLM-based methods.

The authors tackled the challenge of integrating multimodal EHR data for clinical prediction by introducing MoMA, a mixture-of-multimodal-agents architecture that uses specialized LLM agents to convert non-textual data into summaries and an aggregator to combine them, achieving state-of-the-art performance on three real-world prediction tasks.

Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA on three prediction tasks using real-world datasets with different modality combinations and prediction settings, MoMA outperforms current state-of-the-art methods, highlighting its enhanced accuracy and flexibility across various tasks.

View on arXiv PDF

Similar