CL AIMar 20, 2023

Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards

Chanjun Park, Hyeonseok Moon, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim

arXiv:2303.10888v10.92 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses a foundational issue in NLP evaluation for researchers and practitioners, aiming to shift leaderboards towards real-world centricity, though it is incremental as it builds on existing leaderboard concepts.

The paper tackles the problem that current NLP leaderboards focus on static test sets, which may not reflect real-world performance, and proposes a new paradigm for leaderboard systems to address issues like test set bias and real-world discrepancies.

Leaderboard systems allow researchers to objectively evaluate Natural Language Processing (NLP) models and are typically used to identify models that exhibit superior performance on a given task in a predetermined setting. However, we argue that evaluation on a given test dataset is just one of many performance indications of the model. In this paper, we claim leaderboard competitions should also aim to identify models that exhibit the best performance in a real-world setting. We highlight three issues with current leaderboard systems: (1) the use of a single, static test set, (2) discrepancy between testing and real-world application (3) the tendency for leaderboard-centric competition to be biased towards the test set. As a solution, we propose a new paradigm of leaderboard systems that addresses these issues of current leaderboard system. Through this study, we hope to induce a paradigm shift towards more real -world-centric leaderboard competitions.

View on arXiv PDF

Similar