Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models
This work addresses the problem of semantic correspondence in image synthesis for users of diffusion models, offering an incremental improvement over existing methods.
The paper tackles training-free appearance transfer between images using diffusion models, achieving superior results in preserving target structure and reference color even for unaligned images.
As pre-trained text-to-image diffusion models have become a useful tool for image synthesis, people want to specify the results in various ways. This paper tackles training-free appearance transfer, which produces an image with the structure of a target image from the appearance of a reference image. Existing methods usually do not reflect semantic correspondence, as they rely on query-key similarity within the self-attention layer to establish correspondences between images. To this end, we propose explicitly rearranging the features according to the dense semantic correspondences. Extensive experiments show the superiority of our method in various aspects: preserving the structure of the target and reflecting the correct color from the reference, even when the two images are not aligned.