CVJul 10, 2022

Horizontal and Vertical Attention in Transformers

arXiv:2207.04399v12.61 citationsh-index: 45Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for better feature learning in Transformers for various supervised tasks, but it is incremental as it builds on existing attention mechanisms.

The paper tackles the problem of enhancing feature representation in Transformers by proposing horizontal and vertical attention mechanisms to re-weight multi-head outputs and recalibrate channel-wise features, resulting in improved generalization across supervised learning tasks with minimal computational overhead.

Transformers are built upon multi-head scaled dot-product attention and positional encoding, which aim to learn the feature representations and token dependencies. In this work, we focus on enhancing the distinctive representation by learning to augment the feature maps with the self-attention mechanism in Transformers. Specifically, we propose the horizontal attention to re-weight the multi-head output of the scaled dot-product attention before dimensionality reduction, and propose the vertical attention to adaptively re-calibrate channel-wise feature responses by explicitly modelling inter-dependencies among different channels. We demonstrate the Transformer models equipped with the two attentions have a high generalization capability across different supervised learning tasks, with a very minor additional computational cost overhead. The proposed horizontal and vertical attentions are highly modular, which can be inserted into various Transformer models to further improve the performance. Our code is available in the supplementary material.

View on arXiv PDF

Similar