CLJun 21, 2023

Towards Accurate Translation via Semantically Appropriate Application of Lexical Constraints

Yujin Baek, Koanho Lee, Dayeon Ki, Hyoung-Gyu Lee, Cheonbok Park, Jaegul Choo

arXiv:2306.12089v126.3224 citationsh-index: 44Has Code

Originality Incremental advance

AI Analysis

This work addresses practical issues in machine translation for users needing accurate terminology integration, though it is incremental as it builds on existing LNMT methods.

The paper tackles the challenge of lexically-constrained neural machine translation under real-world conditions, specifically handling homographs and unseen constraints, and introduces PLUMCOT and the HOLLY benchmark, showing remarkable effectiveness for unseen constraints.

Lexically-constrained NMT (LNMT) aims to incorporate user-provided terminology into translations. Despite its practical advantages, existing work has not evaluated LNMT models under challenging real-world conditions. In this paper, we focus on two important but under-studied issues that lie in the current evaluation process of LNMT studies. The model needs to cope with challenging lexical constraints that are "homographs" or "unseen" during training. To this end, we first design a homograph disambiguation module to differentiate the meanings of homographs. Moreover, we propose PLUMCOT, which integrates contextually rich information about unseen lexical constraints from pre-trained language models and strengthens a copy mechanism of the pointer network via direct supervision of a copying score. We also release HOLLY, an evaluation benchmark for assessing the ability of a model to cope with "homographic" and "unseen" lexical constraints. Experiments on HOLLY and the previous test setup show the effectiveness of our method. The effects of PLUMCOT are shown to be remarkable in "unseen" constraints. Our dataset is available at https://github.com/papago-lab/HOLLY-benchmark

View on arXiv PDF Code

Similar