Beyond "To whom it may concern": Tailoring Machine Translation to Audience and Intent
For MT researchers and practitioners, it demonstrates that LLMs can effectively tailor translations to audience and intent, but highlights that current metrics fail to capture adaptation quality.
The paper evaluates purpose-driven machine translation using LLMs across 50 languages, finding that explicit instructions improve adaptedness, especially for informal domains and larger models, and that self-generated instructions close up to 80% of the adaptedness gap to curated ones.
Translation quality depends on purpose: the same source text demands different translations depending on audience, tone, and communicative intent. Yet MT models and metrics treat translation as a fixed mapping from source to target. LLMs enable users to explicitly specify purpose alongside source text, yet this capability has not been evaluated at scale. We introduce a systematic evaluation of purpose-driven MT across 50 languages, 5 model sizes and 8 text domains. We find that (1) explicit instructions substantially improve translation adaptedness, with larger gains on informal domains (conversation, social media), for larger model sizes and for higher-resource languages; (2) instructions outperform semantically-matched few-shot examples and paragraph-level context; (3) traditional MT metrics fail to capture adaptation quality, often penalizing adapted translations; (4) when curated instructions are unavailable, models can self-generate them from surrounding document context, closing up to 80% of the adaptedness gap to curated instructions. Our results establish that purpose-adapted MT is a viable and measurable capability of LLMs, while highlighting the need for purpose-aware metrics.