CVOct 25, 2021

The Nuts and Bolts of Adopting Transformer in GANs

Rui Xu, Xiangyu Xu, Kai Chen, Bolei Zhou, Chen Change Loy

arXiv:2110.13107v35.64 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving image generation in GANs for computer vision researchers, but it is incremental as it builds on existing Transformer and GAN frameworks.

The paper tackled the challenge of integrating Transformers into GANs for high-fidelity image synthesis, finding that residual connections in self-attention layers are harmful and proposing a CNN-free generator (STrans-G) that achieves competitive results and a Transformer-based discriminator (STrans-D) that significantly reduces the performance gap with CNN-based discriminators.

Transformer becomes prevalent in computer vision, especially for high-level vision tasks. However, adopting Transformer in the generative adversarial network (GAN) framework is still an open yet challenging problem. In this paper, we conduct a comprehensive empirical study to investigate the properties of Transformer in GAN for high-fidelity image synthesis. Our analysis highlights and reaffirms the importance of feature locality in image generation, although the merits of the locality are well known in the classification task. Perhaps more interestingly, we find the residual connections in self-attention layers harmful for learning Transformer-based discriminators and conditional generators. We carefully examine the influence and propose effective ways to mitigate the negative impacts. Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G, which achieves competitive results in both unconditional and conditional image generations. The Transformer-based discriminator, STrans-D, also significantly reduces its gap against the CNN-based discriminators.

View on arXiv PDF

Similar