CLAug 24, 2025

Evaluating the Impact of Verbal Multiword Expressions on Machine Translation

arXiv:2508.17458v1h-index: 1
Originality Synthesis-oriented
AI Analysis

This addresses a specific challenge in natural language processing for machine translation systems, focusing on domain-specific linguistic phenomena.

The study tackled the problem of translating verbal multiword expressions (VMWEs) in machine translation, finding that they consistently reduce translation quality, and proposed an LLM-based paraphrasing method that significantly improved quality for verbal idioms and verb-particle constructions.

Verbal multiword expressions (VMWEs) present significant challenges for natural language processing due to their complex and often non-compositional nature. While machine translation models have seen significant improvement with the advent of language models in recent years, accurately translating these complex linguistic structures remains an open problem. In this study, we analyze the impact of three VMWE categories -- verbal idioms, verb-particle constructions, and light verb constructions -- on machine translation quality from English to multiple languages. Using both established multiword expression datasets and sentences containing these language phenomena extracted from machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality. We also propose an LLM-based paraphrasing approach that replaces these expressions with their literal counterparts, demonstrating significant improvement in translation quality for verbal idioms and verb-particle constructions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes