CLOct 7, 2020

TeaForN: Teacher-Forcing with N-grams

Sebastian Goodman, Nan Ding, Radu Soricut

arXiv:2010.03494v231.1998 citations

Originality Incremental advance

AI Analysis

This addresses a fundamental issue in sequence generation for tasks like machine translation and summarization, but it is an incremental improvement over existing teacher-forcing methods.

The paper tackles the problems of exposure bias and lack of differentiability in sequence generation models trained with teacher-forcing by proposing TeaForN, a method using a stack of N decoders to enable updates based on N prediction steps, which improves generation quality on WMT 2014 English-French, CNN/Dailymail, and Gigaword benchmarks.

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.

View on arXiv PDF

Similar