CLOct 31, 2018

Convolutional Self-Attention Network

arXiv:1810.13320v211 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better locality modeling in self-attention networks for machine translation, representing an incremental improvement.

The paper tackles the problem of enhancing self-attention networks by proposing a convolutional self-attention network (CSAN) to capture neighboring dependencies and model interactions between attention heads, resulting in improved performance on the WMT14 English-to-German translation task compared to the Transformer baseline and other methods without adding new parameters.

Self-attention network (SAN) has recently attracted increasing interest due to its fully parallelized computation and flexibility in modeling dependencies. It can be further enhanced with multi-headed attention mechanism by allowing the model to jointly attend to information from different representation subspaces at different positions (Vaswani et al., 2017). In this work, we propose a novel convolutional self-attention network (CSAN), which offers SAN the abilities to 1) capture neighboring dependencies, and 2) model the interaction between multiple attention heads. Experimental results on WMT14 English-to-German translation task demonstrate that the proposed approach outperforms both the strong Transformer baseline and other existing works on enhancing the locality of SAN. Comparing with previous work, our model does not introduce any new parameters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes