Detecting Reference Errors in Scientific Literature with Large Language Models
This addresses the challenge of time-consuming reference error detection in scientific publishing, but is incremental as it applies existing models to a new dataset.
This work tackled the problem of detecting reference errors in scientific literature by evaluating large language models from OpenAI's GPT family, showing they can detect erroneous citations with limited context and without fine-tuning.
Reference errors, such as citation and quotation errors, are common in scientific papers. Such errors can result in the propagation of inaccurate information, but are difficult and time-consuming to detect, posing a significant challenge to scientific publishing. To support automatic detection of reference errors, this work evaluated the ability of large language models in OpenAI's GPT family to detect quotation errors. Specifically, we prepared an expert-annotated, general-domain dataset of statement-reference pairs from journal articles. Large language models were evaluated in different settings with varying amounts of reference information provided by retrieval augmentation. Our results showed that large language models are able to detect erroneous citations with limited context and without fine-tuning. This study contributes to the growing literature that seeks to utilize artificial intelligence to assist in the writing, reviewing, and publishing of scientific papers. Potential avenues for further improvements in this task are also discussed.