DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval
This addresses the challenge of retrieving images across diverse domains without annotations, which is incremental as it builds on feature disentanglement methods.
The paper tackles the problem of unsupervised cross-domain image retrieval by proposing DUDE, which disentangles object features from domain-specific styles using a text-to-image generative model and aligns features progressively, achieving state-of-the-art performance across three benchmark datasets over 13 domains.
Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images of the same category across diverse domains without relying on annotations. Existing UCIR methods, which align cross-domain features for the entire image, often struggle with the domain gap, as the object features critical for retrieval are frequently entangled with domain-specific styles. To address this challenge, we propose DUDE, a novel UCIR method building upon feature disentanglement. In brief, DUDE leverages a text-to-image generative model to disentangle object features from domain-specific styles, thus facilitating semantical image retrieval. To further achieve reliable alignment of the disentangled object features, DUDE aligns mutual neighbors from within domains to across domains in a progressive manner. Extensive experiments demonstrate that DUDE achieves state-of-the-art performance across three benchmark datasets over 13 domains. The code will be released.