IR LGMar 6

OpenExtract: Automated Data Extraction for Systematic Reviews in Health

Jim Achterberg, Bram Van Dijk, Jing Meng, Saif Ul Islam, Gregory Epiphaniou, Carsten Maple, Xuefei Ding, Theodoros N. Arvanitis, Simon Brouwer, Marcel Haas, Marco Spruit

arXiv:2603.13338h-index: 6Has Code

AI Analysis

This addresses the efficiency challenge for researchers conducting systematic reviews, though it appears incremental as it applies existing LLM methods to a specific domain.

The study tackled the problem of automating data extraction for systematic reviews in health by developing OpenExtract, an open-source pipeline using LLMs, and achieved precision and recall scores > 0.8 in a digital health review.

This study presents OpenExtract, an open-source pipeline for automated data extraction in large-scale systematic literature reviews. The pipeline queries large language models (LLMs) to predict data entries based on relevant sections of scientific articles. To test the efficacy of OpenExtract, we apply it to a systematic literature review in digital health and compare its outputs with those of human researchers. OpenExtract achieves precision and recall scores of > 0.8 in this task, indicating that it can be effective at extracting data automatically and efficiently. OpenExtract: https://github.com/JimAchterbergLUMC/OpenExtract.

View on arXiv PDF Code

Similar