AICLIRLOJan 28, 2025

VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records

arXiv:2501.16672v118 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses a critical bottleneck for developing LLM-based applications in clinical medicine by providing a method to verify facts against patient records, though it is incremental as it builds on existing techniques like retrieval-augmented generation.

The paper tackles the problem of ensuring factual accuracy in LLM-generated clinical text by introducing VeriFact, a system that verifies text against electronic health records, achieving up to 92.7% agreement with human ground truth, exceeding average clinician agreement of 88.5%.

Methods to ensure factual accuracy of text generated by large language models (LLM) in clinical medicine are lacking. VeriFact is an artificial intelligence system that combines retrieval-augmented generation and LLM-as-a-Judge to verify whether LLM-generated text is factually supported by a patient's medical history based on their electronic health record (EHR). To evaluate this system, we introduce VeriFact-BHC, a new dataset that decomposes Brief Hospital Course narratives from discharge summaries into a set of simple statements with clinician annotations for whether each statement is supported by the patient's EHR clinical notes. Whereas highest agreement between clinicians was 88.5%, VeriFact achieves up to 92.7% agreement when compared to a denoised and adjudicated average human clinican ground truth, suggesting that VeriFact exceeds the average clinician's ability to fact-check text against a patient's medical record. VeriFact may accelerate the development of LLM-based EHR applications by removing current evaluation bottlenecks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes