CLOct 24, 2021

Understanding the Impact of UGC Specificities on Translation Quality

José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

arXiv:2110.12551v130.7661 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of reliable translation evaluation for user-generated content, which is incremental as it builds on known UGC specificities.

The authors tackled the problem of evaluating machine translation for user-generated content (UGC) by showing that standard metrics on average-case performance are unreliable, and they introduced a new annotated dataset to measure the impact of UGC specificities on translation quality more precisely.

This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT. Our analyses show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality. That is why we introduce a new data set for the evaluation of UGC translation in which UGC specificities have been manually annotated using a fine-grained typology. Using this data set, we conduct several experiments to measure the impact of different kinds of UGC specificities on translation quality, more precisely than previously possible.

View on arXiv PDF

Similar