Francois Grolleau

h-index4
2papers

2 Papers

78.5CYApr 30
Adoption and Use of LLMs at an Academic Medical Center

Nigam H. Shah, Nerissa Ambers, Abby Pandya et al.

While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with "workflow friction" from manual data entry. We developed ChatEHR, a system that enables the use of LLMs with the entire patient timeline spanning several years. ChatEHR enables automations - which are static combinations of prompts and data that perform a fixed task - and interactive use in the electronic health record (EHR) via a user interface (UI). The resulting ability to sift through patient medical records for diverse use-cases such as pre-visit chart review, screening for transfer eligibility, monitoring for surgical site infections, and chart abstraction, redefines LLM use as an institutional capability. This system, accessible after user-training, enables continuous monitoring and evaluation of LLM use. In 1.5 years, we built 7 automations and 1075 users have trained to become routine users of the UI, engaging in 23,000 sessions in the first 3 months of launch. For automations, being model-agnostic and accessing multiple types of data was essential for matching specific clinical or administrative tasks with the most appropriate LLM. Benchmark-based evaluations proved insufficient for monitoring and evaluation of the UI, requiring new methods to monitor performance. Generation of summaries was the most frequent task in the UI, with an estimated 0.73 hallucinations and 1.60 inaccuracies per generation. The resulting mix of cost savings, time savings, and revenue growth required a value assessment framework to prioritize work as well as quantify the impact of using LLMs. Initial estimates are $6M savings in the first year of use, without quantifying the benefit of the better care offered. Such a "build-from-within" strategy provides an opportunity for health systems to maintain agency via a vendor-agnostic, internally governed LLM platform.

CLSep 26, 2025
Retrieval-Augmented Guardrails for AI-Drafted Patient-Portal Messages: Error Taxonomy Construction and Large-Scale Evaluation

Wenyuan Chen, Fateme Nateghi Haredasht, Kameron C. Black et al.

Asynchronous patient-clinician messaging via EHR portals is a growing source of clinician workload, prompting interest in large language models (LLMs) to assist with draft responses. However, LLM outputs may contain clinical inaccuracies, omissions, or tone mismatches, making robust evaluation essential. Our contributions are threefold: (1) we introduce a clinically grounded error ontology comprising 5 domains and 59 granular error codes, developed through inductive coding and expert adjudication; (2) we develop a retrieval-augmented evaluation pipeline (RAEC) that leverages semantically similar historical message-response pairs to improve judgment quality; and (3) we provide a two-stage prompting architecture using DSPy to enable scalable, interpretable, and hierarchical error detection. Our approach assesses the quality of drafts both in isolation and with reference to similar past message-response pairs retrieved from institutional archives. Using a two-stage DSPy pipeline, we compared baseline and reference-enhanced evaluations on over 1,500 patient messages. Retrieval context improved error identification in domains such as clinical completeness and workflow appropriateness. Human validation on 100 messages demonstrated superior agreement (concordance = 50% vs. 33%) and performance (F1 = 0.500 vs. 0.256) of context-enhanced labels vs. baseline, supporting the use of our RAEC pipeline as AI guardrails for patient messaging.