It Takes Two: A Dual Stage Approach for Terminology-Aware Translation
This addresses the challenge of accurate terminology usage in machine translation for domains requiring precise technical terms, though it appears incremental as it builds on existing NMT and LLM methods.
The paper tackles the problem of terminology-constrained machine translation by introducing DuTerm, a two-stage architecture combining a fine-tuned NMT model with a prompt-based LLM for post-editing. The results show that context-driven terminology handling by the LLM yields higher quality translations than strict constraint enforcement, as evaluated on English-to-German, English-to-Spanish, and English-to-Russian tasks using the WMT 2025 Terminology Shared Task corpus.
This paper introduces DuTerm, a novel two-stage architecture for terminology-constrained machine translation. Our system combines a terminology-aware NMT model, adapted via fine-tuning on large-scale synthetic data, with a prompt-based LLM for post-editing. The LLM stage refines NMT output and enforces terminology adherence. We evaluate DuTerm on English-to German, English-to-Spanish, and English-to-Russian with the WMT 2025 Terminology Shared Task corpus. We demonstrate that flexible, context-driven terminology handling by the LLM consistently yields higher quality translations than strict constraint enforcement. Our results highlight a critical trade-off, revealing that an LLM's work best for high-quality translation as context-driven mutators rather than generators.