CVNov 20, 2022

Rethinking the Paradigm of Content Constraints in Unpaired Image-to-Image Translation

Xiuding Cai, Yaoyao Zhu, Dong Miao, Linjie Fu, Yu Yao

arXiv:2211.10867v36.516 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses content preservation issues in unpaired image translation, which is crucial for applications like style transfer and domain adaptation, though it is incremental as it builds on existing GAN-based approaches.

The paper tackles the problem of model collapse in unpaired image-to-image translation by proposing EnCo, a method that constrains content using patch-level feature similarity in the generator's latent space, achieving state-of-the-art results on multiple datasets.

In an unpaired setting, lacking sufficient content constraints for image-to-image translation (I2I) tasks, GAN-based approaches are usually prone to model collapse. Current solutions can be divided into two categories, reconstruction-based and Siamese network-based. The former requires that the transformed or transforming image can be perfectly converted back to the original image, which is sometimes too strict and limits the generative performance. The latter involves feeding the original and generated images into a feature extractor and then matching their outputs. This is not efficient enough, and a universal feature extractor is not easily available. In this paper, we propose EnCo, a simple but efficient way to maintain the content by constraining the representational similarity in the latent space of patch-level features from the same stage of the \textbf{En}coder and de\textbf{Co}der of the generator. For the similarity function, we use a simple MSE loss instead of contrastive loss, which is currently widely used in I2I tasks. Benefits from the design, EnCo training is extremely efficient, while the features from the encoder produce a more positive effect on the decoding, leading to more satisfying generations. In addition, we rethink the role played by discriminators in sampling patches and propose a discriminative attention-guided (DAG) patch sampling strategy to replace random sampling. DAG is parameter-free and only requires negligible computational overhead, while significantly improving the performance of the model. Extensive experiments on multiple datasets demonstrate the effectiveness and advantages of EnCo, and we achieve multiple state-of-the-art compared to previous methods. Our code is available at https://github.com/XiudingCai/EnCo-pytorch.

View on arXiv PDF Code

Similar