We Should Evaluate Real-World Impact
This highlights a critical gap in NLP research methodology that hinders the practical adoption and usefulness of the technology, making it a foundational critique rather than an incremental improvement.
The paper addresses the lack of real-world impact evaluations in NLP research, finding that only about 0.1% of ACL Anthology papers include such assessments, with most focusing on metric evaluations instead.
The ACL community has very little interest in evaluating the real-world impact of NLP systems. A structured survey of the ACL Anthology shows that perhaps 0.1% of its papers contain such evaluations; furthermore most papers which include impact evaluations present them very sketchily and instead focus on metric evaluations. NLP technology would be more useful and more quickly adopted if we seriously tried to understand and evaluate its real-world impact.