LGNov 20, 2025

Beyond Generative AI: World Models for Clinical Prediction, Counterfactuals, and Planning

Mohammad Areeb Qazi, Maryam Nadeem, Mohammad Yaqub

arXiv:2511.16333v19.42 citationsh-index: 5

Originality Synthesis-oriented

AI Analysis

It addresses the need for more reliable and data-efficient AI in healthcare, but is incremental as it primarily surveys existing work and identifies gaps.

This paper reviews world models for healthcare, which tackle the problem of improving clinical decision support by learning predictive dynamics for tasks like multistep rollouts and counterfactual evaluation, surveying work across medical imaging, disease progression, and robotic surgery, with most systems achieving basic to intermediate capabilities.

Healthcare requires AI that is predictive, reliable, and data-efficient. However, recent generative models lack physical foundation and temporal reasoning required for clinical decision support. As scaling language models show diminishing returns for grounded clinical reasoning, world models are gaining traction because they learn multimodal, temporally coherent, and action-conditioned representations that reflect the physical and causal structure of care. This paper reviews World Models for healthcare systems that learn predictive dynamics to enable multistep rollouts, counterfactual evaluation and planning. We survey recent work across three domains: (i) medical imaging and diagnostics (e.g., longitudinal tumor simulation, projection-transition modeling, and Joint Embedding Predictive Architecture i.e., JEPA-style predictive representation learning), (ii) disease progression modeling from electronic health records (generative event forecasting at scale), and (iii) robotic surgery and surgical planning (action-conditioned guidance and control). We also introduce a capability rubric: L1 temporal prediction, L2 action-conditioned prediction, L3 counterfactual rollouts for decision support, and L4 planning/control. Most reviewed systems achieve L1--L2, with fewer instances of L3 and rare L4. We identify cross-cutting gaps that limit clinical reliability; under-specified action spaces and safety constraints, weak interventional validation, incomplete multimodal state construction, and limited trajectory-level uncertainty calibration. This review outlines a research agenda for clinically robust prediction-first world models that integrate generative backbones (transformers, diffusion, VAE) with causal/mechanical foundation for safe decision support in healthcare.

View on arXiv PDF

Similar