CLAILGDec 20, 2022

Evaluation for Change

arXiv:2212.11670v1224 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

This is a conceptual position paper that critiques current evaluation practices in NLP, aiming to influence how the research community uses evaluation to shape the field.

The paper argues that evaluation in NLP should be seen as a force for driving change, with a sociological and political character, and suggests that its power is waning despite its potential for promoting pluralistic goals in the field.

Evaluation is the central means for assessing, understanding, and communicating about NLP models. In this position paper, we argue evaluation should be more than that: it is a force for driving change, carrying a sociological and political character beyond its technical dimensions. As a force, evaluation's power arises from its adoption: under our view, evaluation succeeds when it achieves the desired change in the field. Further, by framing evaluation as a force, we consider how it competes with other forces. Under our analysis, we conjecture that the current trajectory of NLP suggests evaluation's power is waning, in spite of its potential for realizing more pluralistic ambitions in the field. We conclude by discussing the legitimacy of this power, who acquires this power and how it distributes. Ultimately, we hope the research community will more aggressively harness evaluation for change.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes