CLFeb 15, 2019

Context-Aware Self-Attention Networks

arXiv:1902.05766v1122 citations
Originality Incremental advance
AI Analysis

This work addresses a limitation in self-attention models for machine translation, offering an incremental improvement by incorporating context without external resources.

The paper tackled the problem of self-attention networks ignoring contextual information by proposing a method to contextualize query and key layers using internal representations, resulting in improved performance on WMT14 English-German and WMT17 Chinese-English translation tasks.

Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which have proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on improving self-attention networks through capturing the richness of context. To maintain the simplicity and flexibility of the self-attention networks, we propose to contextualize the transformations of the query and key layers, which are used to calculates the relevance between elements. Specifically, we leverage the internal representations that embed both global and deep contexts, thus avoid relying on external resources. Experimental results on WMT14 English-German and WMT17 Chinese-English translation tasks demonstrate the effectiveness and universality of the proposed methods. Furthermore, we conducted extensive analyses to quantity how the context vectors participate in the self-attention model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes