LG AIMar 22, 2023

The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs

Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A. Pfeffer, Jason Fries, Nigam H. Shah

Stanford

arXiv:2303.12961v222.639 citationsh-index: 23

Originality Synthesis-oriented

AI Analysis

This survey addresses the gap between hype and actual utility of clinical foundation models for improving patient care and hospital operations, highlighting incremental improvements in evaluation methods.

The paper reviews over 80 foundation models for electronic medical records, finding they are often trained on limited datasets and evaluated on tasks that lack real-world healthcare relevance, and proposes a new evaluation framework to better measure their benefits.

The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models trained on non-imaging EMR data (i.e. clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. In light of these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.

View on arXiv PDF

Similar