CLCVMay 14, 2018

Token-level and sequence-level loss smoothing for RNN language models

arXiv:1805.05062v11100 citations
Originality Incremental advance
AI Analysis

This work addresses performance issues in RNN language models for applications like image captioning and machine translation, representing an incremental improvement over existing smoothing approaches.

The paper tackled the limitations of maximum likelihood estimation in RNN language models, such as ignoring output space structure and exposure bias, by proposing token-level and sequence-level loss smoothing methods, resulting in significant improvements on image captioning and machine translation tasks.

Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from "exposure bias": during training tokens are predicted given ground-truth sequences, while at test time prediction is conditioned on generated output sequences. To overcome these limitations we build upon the recent reward augmented maximum likelihood approach \ie sequence-level smoothing that encourages the model to predict sentences close to the ground truth according to a given performance metric. We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach. Our experiments on two different tasks, image captioning and machine translation, show that token-level and sequence-level loss smoothing are complementary, and significantly improve results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes