Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge
This work addresses limitations in unpaired image translation for applications like medical imaging, offering improved generalization and fidelity, though it is incremental as it builds on existing diffusion and self-supervised learning techniques.
The paper tackles the problem of unpaired image-to-image translation by proposing the Self-Supervised Semantic Bridge (SSB) framework, which integrates semantic priors into diffusion models to achieve spatially faithful translations without cross-domain supervision, resulting in outperformance over prior methods in medical image synthesis and text-guided editing.
Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss during training, which can limit generalization to unseen data, while diffusion-inversion methods often produce low-fidelity translations due to imperfect inversion into noise-latent representations. In this work, we propose the Self-Supervised Semantic Bridge (SSB), a versatile framework that integrates external semantic priors into diffusion bridge models to enable spatially faithful translation without cross-domain supervision. Our key idea is to leverage self-supervised visual encoders to learn representations that are invariant to appearance changes but capture geometric structure, forming a shared latent space that conditions the diffusion bridges. Extensive experiments show that SSB outperforms strong prior methods for challenging medical image synthesis in both in-domain and out-of-domain settings, and extends easily to high-quality text-guided editing.