Understanding the Impact of UGC Specificities on Translation Quality
This work addresses the challenge of reliable translation evaluation for user-generated content, which is incremental as it builds on known UGC specificities.
The authors tackled the problem of evaluating machine translation for user-generated content (UGC) by showing that standard metrics on average-case performance are unreliable, and they introduced a new annotated dataset to measure the impact of UGC specificities on translation quality more precisely.
This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT. Our analyses show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality. That is why we introduce a new data set for the evaluation of UGC translation in which UGC specificities have been manually annotated using a fine-grained typology. Using this data set, we conduct several experiments to measure the impact of different kinds of UGC specificities on translation quality, more precisely than previously possible.