CLOct 22, 2025

Automated HIV Screening on Dutch Electronic Health Records with Large Language Models

Lang Zhou, Amrish Jhingoer, Yinghao Luo, Klaske Vliegenthart--Jongbloed, Carlijn Jordans, Ben Werkhoven, Tom Seinen, Erik van Mulligen, Casper Rokx, Yunlei Li

arXiv:2510.19879v2h-index: 27

Originality Incremental advance

AI Analysis

This addresses the problem of efficient HIV screening for healthcare providers by leveraging unstructured clinical notes, though it is incremental as it builds on existing machine learning approaches.

The study tackled HIV screening by developing a pipeline using a Large Language Model to analyze unstructured text in Electronic Health Records, achieving high accuracy and a low false negative rate on clinical data from Erasmus University Medical Center Rotterdam.

Efficient screening and early diagnosis of HIV are critical for reducing onward transmission. Although large scale laboratory testing is not feasible, the widespread adoption of Electronic Health Records (EHRs) offers new opportunities to address this challenge. Existing research primarily focuses on applying machine learning methods to structured data, such as patient demographics, for improving HIV diagnosis. However, these approaches often overlook unstructured text data such as clinical notes, which potentially contain valuable information relevant to HIV risk. In this study, we propose a novel pipeline that leverages a Large Language Model (LLM) to analyze unstructured EHR text and determine a patient's eligibility for further HIV testing. Experimental results on clinical data from Erasmus University Medical Center Rotterdam demonstrate that our pipeline achieved high accuracy while maintaining a low false negative rate.

View on arXiv PDF

Similar