CLAINov 3, 2021

Lingua Custodia's participation at the WMT 2021 Machine Translation using Terminologies shared task

arXiv:2111.02120v1649 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate terminology use in machine translation for specific language pairs, representing an incremental improvement over existing methods.

The paper tackled the problem of incorporating terminological constraints into machine translation by modifying a Transformer architecture with data augmentation and constraint token masking, achieving high translation quality while satisfying most constraints.

This paper describes Lingua Custodia's submission to the WMT21 shared task on machine translation using terminologies. We consider three directions, namely English to French, Russian, and Chinese. We rely on a Transformer-based architecture as a building block, and we explore a method which introduces two main changes to the standard procedure to handle terminologies. The first one consists in augmenting the training data in such a way as to encourage the model to learn a copy behavior when it encounters terminology constraint terms. The second change is constraint token masking, whose purpose is to ease copy behavior learning and to improve model generalization. Empirical results show that our method satisfies most terminology constraints while maintaining high translation quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes