CoDA21: Evaluating Language Understanding Capabilities of NLP Models With Context-Definition Alignment
This provides a new benchmark for evaluating NLP models, addressing a gap in existing tasks, though it is incremental in the context of benchmark development.
The authors tackled the need for more challenging benchmarks for pretrained language models by introducing CoDA21, a context-definition alignment task that measures natural language understanding, and found a large performance gap between humans and models.
Pretrained language models (PLMs) have achieved superhuman performance on many benchmarks, creating a need for harder tasks. We introduce CoDA21 (Context Definition Alignment), a challenging benchmark that measures natural language understanding (NLU) capabilities of PLMs: Given a definition and a context each for k words, but not the words themselves, the task is to align the k definitions with the k contexts. CoDA21 requires a deep understanding of contexts and definitions, including complex inference and world knowledge. We find that there is a large gap between human and PLM performance, suggesting that CoDA21 measures an aspect of NLU that is not sufficiently covered in existing benchmarks.