AI CL CVMar 9, 2022

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Simon Ott, Adriano Barbosa-Silva, Kathrin Blagec, Jan Brauner, Matthias Samwald

arXiv:2203.04592v431.388 citationsh-index: 35

Originality Synthesis-oriented

AI Analysis

This addresses problems for AI researchers and practitioners by monitoring benchmark health, though it is incremental as it builds on prior concerns about overfitting and centralization.

The study analyzed 3,765 benchmarks in computer vision and natural language processing, finding that many quickly trend toward near-saturation, fail to gain widespread use, and exhibit unpredictable performance bursts, highlighting issues in the AI benchmarking ecosystem.

Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curated data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trended towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks were prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility.

View on arXiv PDF

Similar