CLJan 18, 2019

Improving Sequence-to-Sequence Learning via Optimal Transport

arXiv:1901.06283v197 citations
Originality Incremental advance
AI Analysis

This addresses the issue of semantic coherence in sequence generation for NLP practitioners, though it is incremental as it builds on existing sequence-to-sequence frameworks.

The paper tackled the problem of sequence-to-sequence models failing to capture long-range semantic structure due to word-level maximum likelihood training, and introduced a novel supervision method based on optimal transport to impose global sequence-level guidance, resulting in consistent improvements across NLP tasks like machine translation, text summarization, and image captioning.

Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This procedure focuses on modeling local syntactic patterns, and may fail to capture long-range semantic structure. We present a novel solution to alleviate these issues. Our approach imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features. We further show that this method can be understood as a Wasserstein gradient flow trying to match our model to the ground truth sequence distribution. Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine translation, abstractive text summarization, and image captioning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes