CVJun 16, 2021

Invertible Attention

arXiv:2106.09003v28 citationsHas Code
AI Analysis

This work addresses a bottleneck for researchers and practitioners in invertible neural networks by enabling attention mechanisms, which are crucial for capturing long-range dependencies, to be integrated without losing invertibility, representing an incremental advancement in the field.

The paper tackles the problem of making attention mechanisms compatible with invertible networks by proposing invertible attention, which is achieved by constraining the Lipschitz constant, and validates it on image reconstruction tasks using datasets like CIFAR-10, SVHN, and CelebA, showing similar performance to non-invertible attention in dense prediction tasks.

Attention has been proved to be an efficient mechanism to capture long-range dependencies. However, so far it has not been deployed in invertible networks. This is due to the fact that in order to make a network invertible, every component within the network needs to be a bijective transformation, but a normal attention block is not. In this paper, we propose invertible attention that can be plugged into existing invertible models. We mathematically and experimentally prove that the invertibility of an attention model can be achieved by carefully constraining its Lipschitz constant. We validate the invertibility of our invertible attention on image reconstruction task with 3 popular datasets: CIFAR-10, SVHN, and CelebA. We also show that our invertible attention achieves similar performance in comparison with normal non-invertible attention on dense prediction tasks. The code is available at https://github.com/Schwartz-Zha/InvertibleAttention

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes