CLMEMay 23, 2025

A Position Paper on the Automatic Generation of Machine Learning Leaderboards

arXiv:2505.17465v23 citationsh-index: 3EMNLP
Originality Synthesis-oriented
AI Analysis

This addresses the problem of managing and comparing ML research efficiently for researchers, but it is incremental as it builds on prior work to unify and guide future efforts.

The paper tackles the challenge of automatically generating machine learning leaderboards from research papers by providing the first overview of Automatic Leaderboard Generation (ALG) research, proposing a unified framework and benchmarking guidelines to standardize and improve evaluation.

An important task in machine learning (ML) research is comparing prior work, which is often performed via ML leaderboards: a tabular overview of experiments with comparable conditions (e.g., same task, dataset, and metric). However, the growing volume of literature creates challenges in creating and maintaining these leaderboards. To ease this burden, researchers have developed methods to extract leaderboard entries from research papers for automated leaderboard curation. Yet, prior work varies in problem framing, complicating comparisons and limiting real-world applicability. In this position paper, we present the first overview of Automatic Leaderboard Generation (ALG) research, identifying fundamental differences in assumptions, scope, and output formats. We propose an ALG unified conceptual framework to standardise how the ALG task is defined. We offer ALG benchmarking guidelines, including recommendations for datasets and metrics that promote fair, reproducible evaluation. Lastly, we outline challenges and new directions for ALG, such as, advocating for broader coverage by including all reported results and richer metadata.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes