CL AI HC IRMay 21, 2025

Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI

arXiv:2505.15031v16.714 citationsh-index: 6Has CodeScientometrics

Originality Synthesis-oriented

AI Analysis

This work addresses a gap in fine-grained analysis for peer review reliability in AI conferences, though it is incremental as it applies existing methods to new data.

This study tackled the problem of assessing consistency between reviewer confidence scores and review content in AI conference peer review, finding high text-score consistency across word, sentence, and aspect levels, with regression analysis showing higher confidence scores correlate with paper rejection.

Peer review is vital in academia for evaluating research quality. Top AI conferences use reviewer confidence scores to ensure review reliability, but existing studies lack fine-grained analysis of text-score consistency, potentially missing key details. This work assesses consistency at word, sentence, and aspect levels using deep learning and NLP conference review data. We employ deep learning to detect hedge sentences and aspects, then analyze report length, hedge word/sentence frequency, aspect mentions, and sentiment to evaluate text-score alignment. Correlation, significance, and regression tests examine confidence scores' impact on paper outcomes. Results show high text-score consistency across all levels, with regression revealing higher confidence scores correlate with paper rejection, validating expert assessments and peer review fairness.

View on arXiv PDF Code

Similar