CVMay 24, 2023

Unsupervised Semantic Correspondence Using Stable Diffusion

arXiv:2305.15581v2142 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of semantic correspondence for computer vision tasks, offering an unsupervised approach that reduces the need for labeled data, though it is incremental as it builds on existing diffusion models.

The paper tackles the problem of finding semantic correspondences across images without supervision by leveraging the semantic knowledge in pre-trained text-to-image diffusion models, achieving results on par with strongly supervised state-of-the-art on PF-Willow and outperforming unsupervised methods by 20.9% on SPair-71k.

Text-to-image diffusion models are now capable of generating images that are often indistinguishable from real images. To generate such images, these models must understand the semantics of the objects they are asked to generate. In this work we show that, without any training, one can leverage this semantic knowledge within diffusion models to find semantic correspondences - locations in multiple images that have the same semantic meaning. Specifically, given an image, we optimize the prompt embeddings of these models for maximum attention on the regions of interest. These optimized embeddings capture semantic information about the location, which can then be transferred to another image. By doing so we obtain results on par with the strongly supervised state of the art on the PF-Willow dataset and significantly outperform (20.9% relative for the SPair-71k dataset) any existing weakly or unsupervised method on PF-Willow, CUB-200 and SPair-71k datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes