DLLGSep 20, 2021

Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment

arXiv:2109.09774v173 citations
Originality Synthesis-oriented
AI Analysis

This highlights flaws in conference peer review for assessing research quality, impacting researchers and the academic community by questioning reliance on top-tier publications.

The study revisited the 2014 NeurIPS experiment on peer review inconsistency, finding that 50% of variation in reviewer scores was subjective, and for accepted papers, there was no correlation between scores and citation impact, while rejected papers showed such a correlation.

In this paper we revisit the 2014 NeurIPS experiment that examined inconsistency in conference peer review. We determine that 50\% of the variation in reviewer quality scores was subjective in origin. Further, with seven years passing since the experiment we find that for \emph{accepted} papers, there is no correlation between quality scores and impact of the paper as measured as a function of citation count. We trace the fate of rejected papers, recovering where these papers were eventually published. For these papers we find a correlation between quality scores and impact. We conclude that the reviewing process for the 2014 conference was good for identifying poor papers, but poor for identifying good papers. We give some suggestions for improving the reviewing process but also warn against removing the subjective element. Finally, we suggest that the real conclusion of the experiment is that the community should place less onus on the notion of `top-tier conference publications' when assessing the quality of individual researchers. For NeurIPS 2021, the PCs are repeating the experiment, as well as conducting new ones.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes