AI LGOct 23, 2025

Customizing Open Source LLMs for Quantitative Medication Attribute Extraction across Heterogeneous EHR Systems

Zhe Fei, Mehmet Yigit Turali, Shreyas Rajesh, Xinyang Dai, Huyen Pham, Pavan Holur, Yuhui Zhu, Larissa Mooney, Yih-Ing Hser, Vwani Roychowdhury

arXiv:2510.21027v1h-index: 64Has Code

Originality Incremental advance

AI Analysis

This work addresses the persistent barrier of inconsistent medication data for monitoring MOUD across diverse EHR systems, enabling more reliable cross-site analyses of exposure, adherence, and retention.

The paper tackled the problem of harmonizing medication data across heterogeneous Electronic Health Record (EHR) systems by customizing open source large language models (LLMs) to extract standardized prescription attributes for medications for opioid use disorder (MOUD), achieving up to 93.4% coverage and 93.0% exact-match accuracy on a dataset of 25,605 records.

Harmonizing medication data across Electronic Health Record (EHR) systems is a persistent barrier to monitoring medications for opioid use disorder (MOUD). In heterogeneous EHR systems, key prescription attributes are scattered across differently formatted fields and freetext notes. We present a practical framework that customizes open source large language models (LLMs), including Llama, Qwen, Gemma, and MedGemma, to extract a unified set of MOUD prescription attributes (prescription date, drug name, duration, total quantity, daily quantity, and refills) from heterogeneous, site specific data and compute a standardized metric of medication coverage, \emph{MOUD days}, per patient. Our pipeline processes records directly in a fixed JSON schema, followed by lightweight normalization and cross-field consistency checks. We evaluate the system on prescription level EHR data from five clinics in a national OUD study (25{,}605 records from 1{,}257 patients), using a previously annotated benchmark of 10{,}369 records (776 patients) as the ground truth. Performance is reported as coverage (share of records with a valid, matchable output) and record-level exact-match accuracy. Larger models perform best overall: Qwen2.5-32B achieves \textbf{93.4\%} coverage with \textbf{93.0\%} exact-match accuracy across clinics, and MedGemma-27B attains \textbf{93.1\%}/\textbf{92.2\%}. A brief error review highlights three common issues and fixes: imputing missing dosage fields using within-drug norms, handling monthly/weekly injectables (e.g., Vivitrol) by setting duration from the documented schedule, and adding unit checks to prevent mass units (e.g., ``250 g'') from being misread as daily counts. By removing brittle, site-specific ETL and supporting local, privacy-preserving deployment, this approach enables consistent cross-site analyses of MOUD exposure, adherence, and retention in real-world settings.

View on arXiv PDF

Similar