LG CL CVJun 9, 2015

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer

arXiv:1506.03099v32332 citations

Originality Incremental advance

AI Analysis

This addresses a key problem in sequence generation tasks like machine translation and image captioning for researchers and practitioners, though it is an incremental improvement on existing training methods.

The paper tackled the discrepancy between training and inference in recurrent neural networks for sequence prediction, where errors accumulate due to using true previous tokens during training but generated ones at inference, and proposed a curriculum learning strategy that significantly improved performance, as demonstrated by winning the MSCOCO image captioning challenge in 2015.

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the likelihood of each token in the sequence given the current (recurrent) state and the previous token. At inference, the unknown previous token is then replaced by a token generated by the model itself. This discrepancy between training and inference can yield errors that can accumulate quickly along the generated sequence. We propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead. Experiments on several sequence prediction tasks show that this approach yields significant improvements. Moreover, it was used successfully in our winning entry to the MSCOCO image captioning challenge, 2015.

View on arXiv PDF

Similar