LGDLOct 31, 2024

Benchmark Data Repositories for Better Benchmarking

arXiv:2410.24100v120 citationsh-index: 4NIPS
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of flawed benchmarking practices for machine learning researchers, but it is incremental as it builds on existing critiques without introducing new methods or datasets.

The paper tackles the problem of inadequate attention to benchmark data repositories in machine learning, analyzing their role in addressing issues like representational harms and lack of reproducibility, and proposes design considerations to improve benchmarking practices.

In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are stored, documented, and shared. In this paper, we analyze the landscape of these $\textit{benchmark data repositories}$ and the role they can play in improving benchmarking. This role includes addressing issues with both datasets themselves (e.g., representational harms, construct validity) and the manner in which evaluation is carried out using such datasets (e.g., overemphasis on a few datasets and metrics, lack of reproducibility). To this end, we identify and discuss a set of considerations surrounding the design and use of benchmark data repositories, with a focus on improving benchmarking practices in machine learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes