CLJun 18, 2019

Scheduled Sampling for Transformers

arXiv:1906.07651v11105 citations
Originality Incremental advance
AI Analysis

This addresses exposure bias for Transformer-based NLP tasks, but it is incremental as it adapts an existing technique to a new architecture.

The paper tackled the problem of exposure bias in sequence-to-sequence generation by adapting scheduled sampling to Transformer models, achieving performance close to teacher-forcing baselines in experiments on two language pairs.

Scheduled sampling is a technique for avoiding one of the known problems in sequence-to-sequence generation: exposure bias. It consists of feeding the model a mix of the teacher forced embeddings and the model predictions from the previous step in training time. The technique has been used for improving the model performance with recurrent neural networks (RNN). In the Transformer model, unlike the RNN, the generation of a new word attends to the full sentence generated so far, not only to the last word, and it is not straightforward to apply the scheduled sampling technique. We propose some structural changes to allow scheduled sampling to be applied to Transformer architecture, via a two-pass decoding strategy. Experiments on two language pairs achieve performance close to a teacher-forcing baseline and show that this technique is promising for further exploration.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes