Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

Wenqing Wu, Chengzhi Zhang, Yi Zhao, Tong Bao

arXiv:2604.1957812.8

Predicted impact top 75% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For the academic peer review community, this paper provides empirical evidence that LLMs are shifting the evaluative focus of reviews, potentially undermining core functions of peer review.

The study analyzes how LLMs have changed peer review reports in top AI conferences, finding that post-LLM reviews are longer, more fluent, and focus more on surface-level clarity while reducing attention to deeper evaluative dimensions like originality and replicability.

With the rapid advancement of Large Language Models (LLMs), the academic community has faced unprecedented disruptions, particularly in the realm of academic communication. The primary function of peer review is improving the quality of academic manuscripts, such as clarity, originality and other evaluation aspects. Although prior studies suggest that LLMs are beginning to influence peer review, it remains unclear whether they are altering its core evaluative functions. Moreover, the extent to which LLMs affect the linguistic form, evaluative focus, and recommendation-related signals of peer-review reports has yet to be systematically examined. In this study, we examine the changes in peer review reports for academic articles following the emergence of LLMs, emphasizing variations at fine-grained level. Specifically, we investigate linguistic features such as the length and complexity of words and sentences in review comments, while also automatically annotating the evaluation aspects of individual review sentences. We also use a maximum likelihood estimation method, previously established, to identify review reports that potentially have modified or generated by LLMs. Finally, we assess the impact of evaluation aspects mentioned in LLM-assisted review reports on the informativeness of recommendation for paper decision-making. The results indicate that following the emergence of LLMs, peer review texts have become longer and more fluent, with increased emphasis on summaries and surface-level clarity, as well as more standardized linguistic patterns, particularly reviewers with lower confidence score. At the same time, attention to deeper evaluative dimensions, such as originality, replicability, and nuanced critical reasoning, has declined.

View on arXiv PDF

Similar