CLJul 18, 2022

MAD for Robust Reinforcement Learning in Machine Translation

Domenic Donato, Lei Yu, Wang Ling, Chris Dyer

DeepMind

arXiv:2207.08583v12.18 citationsh-index: 77

Originality Incremental advance

AI Analysis

This work addresses training challenges in reinforcement learning for machine translation, offering a robust method that improves stability and generalization, though it is incremental as it builds on existing policy gradient approaches.

The authors tackled the problem of training instability and poor generalization in reward-aware machine translation by introducing the MAD algorithm, which outperformed existing methods like REINFORCE, MRT, and PPO in experiments across various translation tasks, showing strong performance with both greedy decoding and beam search.

We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training.

View on arXiv PDF

Similar