Lingua Custodia at WMT'19: Attempts to Control Terminology
This addresses the challenge of translating domain-specific entities like political names in machine translation for shared task participants, but it is incremental as it builds on existing methods.
The paper tackled the problem of adapting machine translation terminology for German-to-French on the EU elections topic without in-domain training data, resulting in a submission to the WMT'19 shared task that used backtranslation with constrained decoding to ensure accurate translation of specific terms.
This paper describes Lingua Custodia's submission to the WMT'19 news shared task for German-to-French on the topic of the EU elections. We report experiments on the adaptation of the terminology of a machine translation system to a specific topic, aimed at providing more accurate translations of specific entities like political parties and person names, given that the shared task provided no in-domain training parallel data dealing with the restricted topic. Our primary submission to the shared task uses backtranslation generated with a type of decoding allowing the insertion of constraints in the output in order to guarantee the correct translation of specific terms that are not necessarily observed in the data.