CVMar 30, 2022

ITTR: Unpaired Image-to-Image Translation with Transformers

arXiv:2203.16015v127 citations
Originality Incremental advance
AI Analysis

This work improves image-to-image translation for applications like style transfer or domain adaptation, though it is incremental as it builds on existing transformer and CNN methods.

The paper tackles the problem of unpaired image-to-image translation by proposing ITTR, a transformer-based architecture that addresses limitations in capturing long-range dependencies and computational efficiency, achieving state-of-the-art performance on six benchmark datasets.

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data. By utilizing CNN in extracting local semantics, various techniques have been developed to improve the translation performance. However, CNN-based generators lack the ability to capture long-range dependency to well exploit global semantics. Recently, Vision Transformers have been widely investigated for recognition tasks. Though appealing, it is inappropriate to simply transfer a recognition-based vision transformer to image-to-image translation due to the generation difficulty and the computation limitation. In this paper, we propose an effective and efficient architecture for unpaired Image-to-Image Translation with Transformers (ITTR). It has two main designs: 1) hybrid perception block (HPB) for token mixing from different receptive fields to utilize global semantics; 2) dual pruned self-attention (DPSA) to sharply reduce the computational complexity. Our ITTR outperforms the state-of-the-arts for unpaired image-to-image translation on six benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes