LGFeb 20, 2021

Evolving Attention with Residual Convolutions

arXiv:2102.12895v143 citations
Originality Highly original
AI Analysis

This addresses a bottleneck in transformer models for various AI tasks, offering a novel method to enhance attention mechanisms, though it appears incremental as it builds on existing transformer architectures.

The paper tackles the problem of attention maps in transformers being learned independently per layer and sometimes failing to capture precise patterns, by proposing an evolving attention mechanism that uses residual connections and convolutional layers to model attention evolution across layers, achieving significant performance improvements over state-of-the-art models in tasks like image classification, natural language understanding, and machine translation.

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However, they are learned independently in each layer and sometimes fail to capture precise patterns. In this paper, we propose a novel and generic mechanism based on evolving attention to improve the performance of transformers. On one hand, the attention maps in different layers share common knowledge, thus the ones in preceding layers can instruct the attention in succeeding layers through residual connections. On the other hand, low-level and high-level attentions vary in the level of abstraction, so we adopt convolutional layers to model the evolutionary process of attention maps. The proposed evolving attention mechanism achieves significant performance improvement over various state-of-the-art models for multiple tasks, including image classification, natural language understanding and machine translation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes