CVMar 30, 2022

ITTR: Unpaired Image-to-Image Translation with Transformers

Wanfeng Zheng, Qiang Li, Guoxin Zhang, Pengfei Wan, Zhongyuan Wang

arXiv:2203.16015v113.627 citations

Originality Incremental advance

AI Analysis

This work improves image-to-image translation for applications like style transfer or domain adaptation, though it is incremental as it builds on existing transformer and CNN methods.

The paper tackles the problem of unpaired image-to-image translation by proposing ITTR, a transformer-based architecture that addresses limitations in capturing long-range dependencies and computational efficiency, achieving state-of-the-art performance on six benchmark datasets.

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data. By utilizing CNN in extracting local semantics, various techniques have been developed to improve the translation performance. However, CNN-based generators lack the ability to capture long-range dependency to well exploit global semantics. Recently, Vision Transformers have been widely investigated for recognition tasks. Though appealing, it is inappropriate to simply transfer a recognition-based vision transformer to image-to-image translation due to the generation difficulty and the computation limitation. In this paper, we propose an effective and efficient architecture for unpaired Image-to-Image Translation with Transformers (ITTR). It has two main designs: 1) hybrid perception block (HPB) for token mixing from different receptive fields to utilize global semantics; 2) dual pruned self-attention (DPSA) to sharply reduce the computational complexity. Our ITTR outperforms the state-of-the-arts for unpaired image-to-image translation on six benchmark datasets.

View on arXiv PDF

Similar