LGJul 24, 2016

An Actor-Critic Algorithm for Sequence Prediction

arXiv:1607.07086v3669 citations
AI Analysis

This addresses a key limitation in sequence prediction for natural language generation tasks, though it is incremental as it adapts existing RL techniques to supervised learning.

The paper tackles the discrepancy between training and testing in neural sequence generation by introducing an actor-critic method that optimizes task-specific scores like BLEU, leading to improved performance on a synthetic task and German-English machine translation.

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a \textit{critic} network that is trained to predict the value of an output token, given the policy of an \textit{actor} network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes