CLFeb 22, 2020

Machine Translation System Selection from Bandit Feedback

arXiv:2002.09646v231.1998 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of real-world machine translation adaptation for users with varying and dynamic translation needs, representing an incremental improvement through a novel selection-based approach.

The paper tackles the problem of adapting machine translation systems to diverse and changing user needs by treating adaptation as a selection task, training multiple systems and using bandit learning to choose the best one per task, resulting in quick domain adaptation, outperforming the single best system in mixed-domain tasks, and making effective instance-specific decisions.

Adapting machine translation systems in the real world is a difficult problem. In contrast to offline training, users cannot provide the type of fine-grained feedback (such as correct translations) typically used for improving the system. Moreover, different users have different translation needs, and even a single user's needs may change over time. In this work we take a different approach, treating the problem of adaptation as one of selection. Instead of adapting a single system, we train many translation systems using different architectures, datasets, and optimization methods. Using bandit learning techniques on simulated user feedback, we learn a policy to choose which system to use for a particular translation task. We show that our approach can (1) quickly adapt to address domain changes in translation tasks, (2) outperform the single best system in mixed-domain translation tasks, and (3) make effective instance-specific decisions when using contextual bandit strategies.

View on arXiv PDF

Similar