IVCVOct 11, 2021

Vit-GAN: Image-to-image Translation with Vision Transformes and Conditional GANS

arXiv:2110.09305v1Has Code
Originality Incremental advance
AI Analysis

This work addresses image-to-image translation for computer vision applications, but it is incremental as it builds on prior generator-based models.

The paper tackles image-to-image translation tasks, such as semantic segmentation and depth perception, by proposing Vit-GAN, which combines vision transformers and conditional GANs to produce more realistic results than common architectures.

In this paper, we have developed a general-purpose architecture, Vit-Gan, capable of performing most of the image-to-image translation tasks from semantic image segmentation to single image depth perception. This paper is a follow-up paper, an extension of generator-based model [1] in which the obtained results were very promising. This opened the possibility of further improvements with adversarial architecture. We used a unique vision transformers-based generator architecture and Conditional GANs(cGANs) with a Markovian Discriminator (PatchGAN) (https://github.com/YigitGunduc/vit-gan). In the present work, we use images as conditioning arguments. It is observed that the obtained results are more realistic than the commonly used architectures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes