Prior and Prejudice: The Novice Reviewers' Bias against Resubmissions in Conference Peer Review
This research addresses the potential for reviewer bias against resubmissions, which could unfairly penalize authors and impact the quality of peer review in ML/CS conferences.
This study investigates whether novice reviewers exhibit bias against resubmitted papers in machine learning and computer science conferences. They found that reviewers who knew a paper was a resubmission gave an average overall score almost 1 point lower on a 10-point scale (Δ = -0.78) compared to those unaware of its resubmission status.
Modern machine learning and computer science conferences are experiencing a surge in the number of submissions that challenges the quality of peer review as the number of competent reviewers is growing at a much slower rate. To curb this trend and reduce the burden on reviewers, several conferences have started encouraging or even requiring authors to declare the previous submission history of their papers. Such initiatives have been met with skepticism among authors, who raise the concern about a potential bias in reviewers' recommendations induced by this information. In this work, we investigate whether reviewers exhibit a bias caused by the knowledge that the submission under review was previously rejected at a similar venue, focusing on a population of novice reviewers who constitute a large fraction of the reviewer pool in leading machine learning and computer science conferences. We design and conduct a randomized controlled trial closely replicating the relevant components of the peer-review pipeline with $133$ reviewers (master's, junior PhD students, and recent graduates of top US universities) writing reviews for $19$ papers. The analysis reveals that reviewers indeed become negatively biased when they receive a signal about paper being a resubmission, giving almost 1 point lower overall score on a 10-point Likert item ($Δ= -0.78, \ 95\% \ \text{CI} = [-1.30, -0.24]$) than reviewers who do not receive such a signal. Looking at specific criteria scores (originality, quality, clarity and significance), we observe that novice reviewers tend to underrate quality the most.