CLOct 16, 2019

Fine-grained evaluation of German-English Machine Translation based on a Test Suite

Vivien Macketanz, Eleftherios Avramidis, Aljoscha Burchardt, Hans Uszkoreit

arXiv:1910.07460v131.11097 citations

Originality Synthesis-oriented

AI Analysis

This work provides a fine-grained evaluation method for machine translation, addressing the need for more detailed assessment in the field, though it is incremental as it applies existing evaluation techniques to a new dataset.

The researchers tackled the problem of evaluating German-English machine translation systems by analyzing 16 state-of-the-art models using a linguistically-motivated test suite with 5,000 sentences covering 106 phenomena, revealing system-specific strengths and weaknesses and identifying areas where overall performance is low.

We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.

View on arXiv PDF

Similar