ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews
This work addresses the challenge of automating scientific paper revisions for researchers and editors, but it is incremental as it focuses on dataset creation and benchmarking rather than a novel solution.
The authors tackled the problem of automatically revising scientific papers based on peer feedback by introducing ARIES, a dataset of review comments and corresponding paper edits from computer science, and found that state-of-the-art models struggle to identify edits related to comments, especially when indirect reasoning is required.
We introduce the task of automatically revising scientific papers based on peer feedback and release ARIES, a dataset of review comments and their corresponding paper edits. The data is drawn from real reviewer-author interactions from computer science, and we provide labels linking each reviewer comment to the specific paper edits made by the author in response. We automatically create a high-precision silver training set, as well as an expert-labeled test set that shows high inter-annotator agreement. In experiments with 10 models covering the state of the art, we find that they struggle even to identify which edits correspond to a comment -- especially when the relationship between the edit and the comment is indirect and requires reasoning to uncover. We also extensively analyze GPT-4's ability to generate edits given a comment and the original paper. We find that it often succeeds on a superficial level, but tends to rigidly follow the wording of the feedback rather than the underlying intent, and lacks technical details compared to human-written edits.