Automatic TM Cleaning through MT and POS Tagging: Autodesk's Submission to the NLP4TM 2016 Shared Task
This work addresses the issue of cleaning translation memories for NLP practitioners, but it is incremental as it builds directly on prior research.
The paper tackled the problem of identifying incorrect entries in translation memories by extending previous work with recall-based machine translation and part-of-speech tagging features, achieving first place in the Binary Classification task for two out of three language pairs (English-Italian and English-Spanish).
We describe a machine learning based method to identify incorrect entries in translation memories. It extends previous work by Barbu (2015) through incorporating recall-based machine translation and part-of-speech-tagging features. Our system ranked first in the Binary Classification (II) task for two out of three language pairs: English-Italian and English-Spanish.