AIMar 28

Defend: Automated Rebuttals for Peer Review with Minimal Author Guidance

arXiv:2603.2736050.81 citationsh-index: 10

AI Analysis

For researchers and authors in peer review, DEFEND reduces cognitive load while improving rebuttal quality, but the approach is incremental as it combines existing LLM capabilities with structured reasoning.

The paper introduces DEFEND, an LLM-based tool for automated rebuttal generation in peer review that keeps the author in the loop with minimal intervention. Experiments show that DEFEND improves factual correctness and refutation strength over direct LLM use, with segment-wise generation and author-in-the-loop approaches yielding substantial gains.

Rebuttal generation is a critical component of the peer review process for scientific papers, enabling authors to clarify misunderstandings, correct factual inaccuracies, and guide reviewers toward a more accurate evaluation. We observe that Large Language Models (LLMs) often struggle to perform targeted refutation and maintain accurate factual grounding when used directly for rebuttal generation, highlighting the need for structured reasoning and author intervention. To address this, in the paper, we introduce DEFEND an LLM based tool designed to explicitly execute the underlying reasoning process of automated rebuttal generation, while keeping the author-in-the-loop. As opposed to writing the rebuttals from scratch, the author needs to only drive the reasoning process with minimal intervention, leading an efficient approach with minimal effort and less cognitive load. We compare DEFEND against three other paradigms: (i) Direct rebuttal generation using LLM (DRG), (ii) Segment-wise rebuttal generation using LLM (SWRG), and (iii) Sequential approach (SA) of segment-wise rebuttal generation without author intervention. To enable finegrained evaluation, we extend the ReviewCritique dataset, creating review segmentation, deficiency, error type annotations, rebuttal-action labels, and mapping to gold rebuttal segments. Experimental results and a user study demonstrate that directly using LLMs perform poorly in factual correctness and targeted refutation. Segment-wise generation and the automated sequential approach with author-in-the-loop, substantially improve factual correctness and strength of refutation.

View on arXiv PDF

Similar