CLDec 21, 2017

Variational Attention for Sequence-to-Sequence Models

arXiv:1712.08207v31143 citations
Originality Incremental advance
AI Analysis

This addresses a specific technical issue in NLP for researchers, but it is incremental as it builds on existing variational encoder-decoder frameworks.

The paper tackles the problem of variational latent spaces being bypassed by deterministic attention in sequence-to-sequence models, proposing a variational attention mechanism that increases sentence diversity without quality loss.

The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes