CLAIHCLGJul 24, 2017

Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

arXiv:1707.07402v41134 citations
Originality Incremental advance
AI Analysis

This work addresses the cost and scalability issues in machine translation training for developers and users, though it is incremental as it builds on existing RL and attention-based methods.

The paper tackles the problem of expensive human-generated reference translations in neural machine translation by proposing a reinforcement learning algorithm that improves systems using simulated human feedback, achieving effective optimization of traditional corpus-level machine translation metrics.

Machine translation is a natural candidate problem for reinforcement learning from human feedback: users provide quick, dirty ratings on candidate translations to guide a system to improve. Yet, current neural machine translation training focuses on expensive human-generated reference translations. We describe a reinforcement learning algorithm that improves neural machine translation systems from simulated human feedback. Our algorithm combines the advantage actor-critic algorithm (Mnih et al., 2016) with the attention-based neural encoder-decoder architecture (Luong et al., 2015). This algorithm (a) is well-designed for problems with a large action space and delayed rewards, (b) effectively optimizes traditional corpus-level machine translation metrics, and (c) is robust to skewed, high-variance, granular feedback modeled after actual human behaviors.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes