CLApr 23, 2020

Correct Me If You Can: Learning from Error Corrections and Markings

Julia Kreutzer, Nathaniel Berger, Stefan Riezler

arXiv:2004.11222v131.11002 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of high annotation costs for machine translation data, offering a more efficient alternative for researchers and practitioners, though it is incremental as it builds on existing annotation methods.

The study tackled the trade-off between annotation cost and signal strength in sequence-to-sequence learning by evaluating error markings for machine translation, showing that error markings require significantly less human effort than corrections and can successfully fine-tune neural models.

Sequence-to-sequence learning involves a trade-off between signal strength and annotation cost of training data. For example, machine translation data range from costly expert-generated translations that enable supervised learning, to weak quality-judgment feedback that facilitate reinforcement learning. We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings. We show that error markings for translations of TED talks from English to German allow precise credit assignment while requiring significantly less human effort than correcting/post-editing, and that error-marked data can be used successfully to fine-tune neural machine translation models.

View on arXiv PDF Code

Similar