CLOct 8, 2021

The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results

Marina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao, Steffen Eger, Yang Gao

arXiv:2110.04392v131.0666 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This is the first shared task on explainable NLP evaluation metrics, addressing the need for interpretability in quality estimation for machine translation researchers and practitioners.

The paper introduced the Eval4NLP-2021 shared task, which tackled the problem of explainable quality estimation in machine translation by requiring systems to provide both sentence-level quality scores and word-level explanations for negative impacts, and presented data, guidelines, and results from six participating systems.

In this paper, we introduce the Eval4NLP-2021shared task on explainable quality estimation. Given a source-translation pair, this shared task requires not only to provide a sentence-level score indicating the overall quality of the translation, but also to explain this score by identifying the words that negatively impact translation quality. We present the data, annotation guidelines and evaluation setup of the shared task, describe the six participating systems, and analyze the results. To the best of our knowledge, this is the first shared task on explainable NLP evaluation metrics. Datasets and results are available at https://github.com/eval4nlp/SharedTask2021.

View on arXiv PDF Code

Similar