CLNov 14, 2023

Automated title and abstract screening for scoping reviews using the GPT-4 Large Language Model

arXiv:2311.07918v18 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the labor-intensive screening process for researchers conducting scoping reviews, offering a user-friendly tool to support scholarly work, though it is incremental as it builds on existing LLM methods without achieving human-level performance.

The paper tackled the problem of automating title and abstract screening for scoping reviews, which require intensive human effort, by introducing GPTscreenR, an R package using GPT-4 with chain-of-thought techniques; it achieved a sensitivity of 71%, specificity of 89%, and overall accuracy of 84% in validation against human reviewers, performing similarly to an alternative zero-shot method but not reaching perfect accuracy or human levels of agreement.

Scoping reviews, a type of literature review, require intensive human effort to screen large numbers of scholarly sources for their relevance to the review objectives. This manuscript introduces GPTscreenR, a package for the R statistical programming language that uses the GPT-4 Large Language Model (LLM) to automatically screen sources. The package makes use of the chain-of-thought technique with the goal of maximising performance on complex screening tasks. In validation against consensus human reviewer decisions, GPTscreenR performed similarly to an alternative zero-shot technique, with a sensitivity of 71%, specificity of 89%, and overall accuracy of 84%. Neither method achieved perfect accuracy nor human levels of intraobserver agreement. GPTscreenR demonstrates the potential for LLMs to support scholarly work and provides a user-friendly software framework that can be integrated into existing review processes.

View on arXiv PDF

Similar