IRApr 20

MasterSet: A Large-Scale Benchmark for Must-Cite Citation Recommendation in the AI/ML Literature

arXiv:2604.1768034.2h-index: 8
AI Analysis

For researchers and practitioners in AI/ML, this benchmark addresses the need for identifying critical citations that ensure reproducibility and proper attribution, but the contribution is incremental as it extends existing citation recommendation to a specific subset.

The paper introduces MasterSet, a large-scale benchmark with over 150,000 papers from 15 AI/ML venues for must-cite citation recommendation, using a three-tier annotation scheme. Baselines show the task is challenging, with best methods achieving limited recall.

The explosive growth of AI and machine learning literature -- with venues like NeurIPS and ICLR now accepting thousands of papers annually -- has made comprehensive citation coverage increasingly difficult for researchers. While citation recommendation has been studied for over a decade, existing systems primarily focus on broad relevance rather than identifying the critical set of ``must-cite'' papers: direct experimental baselines, foundational methods, and core dependencies whose omission would misrepresent a contribution's novelty or undermine reproducibility. We introduce MasterSet, a large-scale benchmark specifically designed to evaluate must-cite recommendation in the AI/ML domain. MasterSet incorporates over 150,000 papers collected from official conference proceedings/websites of 15 leading venues, serving as a comprehensive candidate pool for retrieval. We annotate citations with a three-tier labeling scheme: (I) experimental baseline status, (II) core relevance (1--5 scale), and (III) intra-paper mention frequency. Our annotation pipeline leverages an LLM-based judge, validated by human experts on a stratified sample. The benchmark task requires retrieving must-cite papers from the candidate pool given only a query paper's title and abstract, evaluated by Recall@$K$. We establish baselines using sparse retrieval, dense scientific embeddings, and graph-based methods, demonstrating that must-cite retrieval remains a challenging open problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes