CLAIIRFeb 19, 2025

PeerQA: A Scientific Question Answering Dataset from Peer Reviews

arXiv:2502.13668v123 citationsh-index: 15Has CodeNAACL
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for practical QA systems in scientific domains by providing a real-world benchmark from peer reviews, though it is incremental as it builds on existing QA frameworks with new data.

The authors introduced PeerQA, a scientific question-answering dataset sourced from peer reviews, containing 579 QA pairs from 208 academic papers, primarily in ML and NLP. They established baselines for evidence retrieval, unanswerable question classification, and answer generation, finding that decontextualization improves retrieval performance and highlighting the dataset's challenge for long-context modeling with papers averaging 12k tokens.

We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health. PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectures. On answer generation, PeerQA serves as a challenging benchmark for long-context modeling, as the papers have an average size of 12k tokens. Our code and data is available at https://github.com/UKPLab/peerqa.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes