CL AIDec 18, 2024

Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition

Xuemei Tang, Xufeng Duan, Zhenguang G. Cai

arXiv:2412.13612v59.616 citationsh-index: 7EMNLP

Originality Synthesis-oriented

AI Analysis

This addresses the problem of unreliable automation in academic literature reviews for researchers, but it is incremental as it builds on existing evaluation frameworks.

This study evaluated large language models (LLMs) for automating literature reviews, finding that even advanced models generate hallucinated references and performance varies across disciplines.

Large language models (LLMs) have emerged as a potential solution to automate the complex processes involved in writing literature reviews, such as literature collection, organization, and summarization. However, it is yet unclear how good LLMs are at automating comprehensive and reliable literature reviews. This study introduces a framework to automatically evaluate the performance of LLMs in three key tasks of literature writing: reference generation, literature summary, and literature review composition. We introduce multidimensional evaluation metrics that assess the hallucination rates in generated references and measure the semantic coverage and factual consistency of the literature summaries and compositions against human-written counterparts. The experimental results reveal that even the most advanced models still generate hallucinated references, despite recent progress. Moreover, we observe that the performance of different models varies across disciplines when it comes to writing literature reviews. These findings highlight the need for further research and development to improve the reliability of LLMs in automating academic literature reviews.

View on arXiv PDF

Similar