IRCLJan 12, 2024

Zero-shot Generative Large Language Models for Systematic Review Screening Automation

arXiv:2401.06320v231 citationsh-index: 41ECIR
AI Analysis

This work addresses the resource-intensive screening phase in systematic reviews for evidence-based medicine, offering a practical automation solution.

This study tackled the problem of automating systematic review screening by evaluating zero-shot large language models (LLMs) with calibration techniques, achieving significant time savings compared to state-of-the-art methods.

Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models~(LLMs) for automatic screening. We evaluate the effectiveness of eight different LLMs and investigate a calibration technique that uses a predefined recall threshold to determine whether a publication should be included in a systematic review. Our comprehensive evaluation using five standard test collections shows that instruction fine-tuning plays an important role in screening, that calibration renders LLMs practical for achieving a targeted recall, and that combining both with an ensemble of zero-shot models saves significant screening time compared to state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes