CLLGDec 25, 2019

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

arXiv:1912.11637v1151 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses efficiency and performance issues in natural language processing and computer vision tasks, but it is incremental as it builds on existing sparse attention methods.

The authors tackled the problem of irrelevant information extraction in self-attention Transformers by proposing the Explicit Sparse Transformer, which improves attention concentration through explicit selection of relevant segments, achieving comparable or better performance with significantly reduced training and testing time, such as doubling inference speed compared to sparsemax.

Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self-attention is able to model long-term dependencies, but it may suffer from the extraction of irrelevant information in the context. To tackle the problem, we propose a novel model called \textbf{Explicit Sparse Transformer}. Explicit Sparse Transformer is able to improve the concentration of attention on the global context through an explicit selection of the most relevant segments. Extensive experimental results on a series of natural language processing and computer vision tasks, including neural machine translation, image captioning, and language modeling, all demonstrate the advantages of Explicit Sparse Transformer in model performance. We also show that our proposed sparse attention method achieves comparable or better results than the previous sparse attention method, but significantly reduces training and testing time. For example, the inference speed is twice that of sparsemax in Transformer model. Code will be available at \url{https://github.com/lancopku/Explicit-Sparse-Transformer}

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes