CLNov 14, 2023

Automated title and abstract screening for scoping reviews using the GPT-4 Large Language Model

arXiv:2311.07918v18 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses the labor-intensive screening process for researchers conducting scoping reviews, offering a user-friendly tool to support scholarly work, though it is incremental as it builds on existing LLM methods without achieving human-level performance.

The paper tackled the problem of automating title and abstract screening for scoping reviews, which require intensive human effort, by introducing GPTscreenR, an R package using GPT-4 with chain-of-thought techniques; it achieved a sensitivity of 71%, specificity of 89%, and overall accuracy of 84% in validation against human reviewers, performing similarly to an alternative zero-shot method but not reaching perfect accuracy or human levels of agreement.

Scoping reviews, a type of literature review, require intensive human effort to screen large numbers of scholarly sources for their relevance to the review objectives. This manuscript introduces GPTscreenR, a package for the R statistical programming language that uses the GPT-4 Large Language Model (LLM) to automatically screen sources. The package makes use of the chain-of-thought technique with the goal of maximising performance on complex screening tasks. In validation against consensus human reviewer decisions, GPTscreenR performed similarly to an alternative zero-shot technique, with a sensitivity of 71%, specificity of 89%, and overall accuracy of 84%. Neither method achieved perfect accuracy nor human levels of intraobserver agreement. GPTscreenR demonstrates the potential for LLMs to support scholarly work and provides a user-friendly software framework that can be integrated into existing review processes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes