CL AI HC LGJul 24, 2017

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

Khanh Nguyen, Hal Daumé, Jordan Boyd-Graber

arXiv:1707.07402v439.91134 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the cost and scalability issues in machine translation training for developers and users, though it is incremental as it builds on existing RL and attention-based methods.

The paper tackles the problem of expensive human-generated reference translations in neural machine translation by proposing a reinforcement learning algorithm that improves systems using simulated human feedback, achieving effective optimization of traditional corpus-level machine translation metrics.

Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.

View on arXiv PDF Code

Similar