IV CVOct 11, 2021

Vit-GAN: Image-to-image Translation with Vision Transformes and Conditional GANS

arXiv:2110.09305v1Has Code

Originality Incremental advance

AI Analysis

This work addresses image-to-image translation for computer vision applications, but it is incremental as it builds on prior generator-based models.

The paper tackles image-to-image translation tasks, such as semantic segmentation and depth perception, by proposing Vit-GAN, which combines vision transformers and conditional GANs to produce more realistic results than common architectures.

In this paper, we have developed a general-purpose architecture, Vit-Gan, capable of performing most of the image-to-image translation tasks from semantic image segmentation to single image depth perception. This paper is a follow-up paper, an extension of generator-based model [1] in which the obtained results were very promising. This opened the possibility of further improvements with adversarial architecture. We used a unique vision transformers-based generator architecture and Conditional GANs(cGANs) with a Markovian Discriminator (PatchGAN) (https://github.com/YigitGunduc/vit-gan). In the present work, we use images as conditioning arguments. It is observed that the obtained results are more realistic than the commonly used architectures.

View on arXiv PDF Code

Similar