CLAIJan 7

NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning

arXiv:2601.03790v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the underexplored challenge of neologism-aware machine translation for multilingual applications, representing an incremental advancement in the field.

The paper tackles the problem of translating sentences containing neologisms by proposing NeoAMT, an agentic framework using a Wiktionary search tool and reinforcement learning, resulting in a new dataset covering 16 languages and 75 translation directions derived from approximately 10 million records.

Neologism-aware machine translation aims to translate source sentences containing neologisms into target languages. This field remains underexplored compared with general machine translation (MT). In this paper, we propose an agentic framework, NeoAMT, for neologism-aware machine translation using a Wiktionary search tool. Specifically, we first create a new dataset for neologism-aware machine translation and develop a search tool based on Wiktionary. The new dataset covers 16 languages and 75 translation directions and is derived from approximately 10 million records of an English Wiktionary dump. The retrieval corpus of the search tool is also constructed from around 3 million cleaned records of the Wiktionary dump. We then use it for training the translation agent with reinforcement learning (RL) and evaluating the accuracy of neologism-aware machine translation. Based on this, we also propose an RL training framework that contains a novel reward design and an adaptive rollout generation approach by leveraging "translation difficulty" to further improve the translation quality of translation agents using our search tool.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes