DL CLMar 26

RenoBench: A Citation Parsing Benchmark

Parth Sarin, Juan Pablo Alperin, Adam Buttrick, Dione Mentis

arXiv:2603.2564057.7h-index: 5

AI Analysis

This provides a reproducible and standardized benchmark for citation parsing systems, benefiting researchers and developers in scholarly infrastructure, though it is incremental as it builds on existing interest in the problem.

The authors tackled the lack of generalizable and publicly available evaluation for citation parsing by introducing RenoBench, a benchmark sourced from 161,000 annotated citations across multiple publishing ecosystems, and found that fine-tuned language models achieved strong performance in field-level precision and recall.

Accurate parsing of citations is necessary for machine-readable scholarly infrastructure. But, despite sustained interest in this problem, existing evaluation techniques are often not generalizable, based on synthetic data, or not publicly available. We introduce RenoBench, a public domain benchmark for citation parsing, sourced from PDFs released on four publishing ecosystems: SciELO, Redalyc, the Public Knowledge Project, and Open Research Europe. Starting from 161,000 annotated citations, we apply automated validation and feature-based sampling to produce a dataset of 10,000 citations spanning multiple languages, publication types, and platforms. We then evaluate a variety of citation parsing systems and report field-level precision and recall. Our results show strong performance from language models, particularly when fine-tuned. RenoBench enables reproducible, standardized evaluation of citation parsing systems, and provides a foundation for advancing automated citation parsing and metascientific research.

View on arXiv PDF

Similar