CLOct 31, 2025

Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

Inacio Vieira, Antonio Castaldo, James O'Doherty, Sheila Castilho

arXiv:2510.27556v12.7h-index: 2

Originality Incremental advance

AI Analysis

This addresses the data efficiency challenge for domain-specific adaptation in machine translation, though it is incremental as it builds on existing CPO methods.

The paper tackles the problem of expensive domain adaptation for large language models in machine translation by applying contrastive preference optimization to simulate a post-editing workflow, achieving performance close to models trained on over 160k samples with only 14.7k preference pairs.

LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-editing workflow for data-efficient domain adaptation. Our approach synthesizes preference pairs by treating the base model's own raw output as the 'rejected' translation and the human-approved TM entry as the 'chosen' one. This method provides direct feedback on the model's current knowledge, guiding it to align with domain-specific standards. Experiments in English-Brazilian Portuguese and English-Korean show that, by using just 14.7k preference pairs, the model achieves performance close to that of a model trained on 160k+ samples with SFT, demonstrating significant data efficiency. Although we showcase its effectiveness in MT, this application of CPO naturally generalizes to other generative tasks where a model's initial drafts can serve as a contrastive signal against a golden reference.

View on arXiv PDF

Similar