SemST: Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment
This addresses the problem of semantic inconsistency in image translation for computer vision applications, offering an incremental improvement over existing methods.
The paper tackles semantic distortion in unsupervised image-to-image translation by proposing SemST, which uses contrastive learning and structure-texture alignment to maintain semantic consistency, achieving state-of-the-art performance and showing applicability to domain adaptation and semantic segmentation pre-training.
Unsupervised image-to-image (I2I) translation learns cross-domain image mapping that transfers input from the source domain to output in the target domain while preserving its semantics. One challenge is that different semantic statistics in source and target domains result in content discrepancy known as semantic distortion. To address this problem, a novel I2I method that maintains semantic consistency in translation is proposed and named SemST in this work. SemST reduces semantic distortion by employing contrastive learning and aligning the structural and textural properties of input and output by maximizing their mutual information. Furthermore, a multi-scale approach is introduced to enhance translation performance, thereby enabling the applicability of SemST to domain adaptation in high-resolution images. Experiments show that SemST effectively mitigates semantic distortion and achieves state-of-the-art performance. Also, the application of SemST to domain adaptation (DA) is explored. It is demonstrated by preliminary experiments that SemST can be utilized as a beneficial pre-training for the semantic segmentation task.