Scott Fleming

LG
h-index14
3papers
128citations
Novelty23%
AI Score30

3 Papers

LGMar 22, 2023
The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs

Michael Wornow, Yizhe Xu, Rahul Thapa et al. · stanford

The successes of foundation models such as ChatGPT and AlphaFold have spurred significant interest in building similar models for electronic medical records (EMRs) to improve patient care and hospital operations. However, recent hype has obscured critical gaps in our understanding of these models' capabilities. We review over 80 foundation models trained on non-imaging EMR data (i.e. clinical text and/or structured data) and create a taxonomy delineating their architectures, training data, and potential use cases. We find that most models are trained on small, narrowly-scoped clinical datasets (e.g. MIMIC-III) or broad, public biomedical corpora (e.g. PubMed) and are evaluated on tasks that do not provide meaningful insights on their usefulness to health systems. In light of these findings, we propose an improved evaluation framework for measuring the benefits of clinical foundation models that is more closely grounded to metrics that matter in healthcare.

LGOct 17, 2025
Reflections from Research Roundtables at the Conference on Health, Inference, and Learning (CHIL) 2025

Emily Alsentzer, Marie-Laure Charpignon, Bill Chen et al.

The 6th Annual Conference on Health, Inference, and Learning (CHIL 2025), hosted by the Association for Health Learning and Inference (AHLI), was held in person on June 25-27, 2025, at the University of California, Berkeley, in Berkeley, California, USA. As part of this year's program, we hosted Research Roundtables to catalyze collaborative, small-group dialogue around critical, timely topics at the intersection of machine learning and healthcare. Each roundtable was moderated by a team of senior and junior chairs who fostered open exchange, intellectual curiosity, and inclusive engagement. The sessions emphasized rigorous discussion of key challenges, exploration of emerging opportunities, and collective ideation toward actionable directions in the field. In total, eight roundtables were held by 19 roundtable chairs on topics of "Explainability, Interpretability, and Transparency," "Uncertainty, Bias, and Fairness," "Causality," "Domain Adaptation," "Foundation Models," "Learning from Small Medical Data," "Multimodal Methods," and "Scalable, Translational Healthcare Solutions."

MENov 15, 2021
Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects

Steve Yadlowsky, Scott Fleming, Nigam Shah et al.

There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules. RATE metrics are agnostic as to how the prioritization rules were derived, and only assess how well they identify individuals that benefit the most from treatment. We define a family of RATE estimators and prove a central limit theorem that enables asymptotically exact inference in a wide variety of randomized and observational study settings. RATE metrics subsume a number of existing metrics, including the Qini coefficient, and our analysis directly yields inference methods for these metrics. We showcase RATE in the context of a number of applications, including optimal targeting of aspirin to stroke patients.