Diff-CA: Separating Common and Salient Factors with Diffusion Models
This work addresses the problem of contrastive analysis for high-fidelity image generation, offering a diffusion-based alternative to VAEs/GANs that improves reconstruction quality and factor separation.
Diff-CA introduces a conditioning framework for diffusion models that separates common and salient factors between two data distributions, enabling high-fidelity image generation and editing. The method achieves effective decomposition without compromising generation quality, as demonstrated through targeted operations like swapping or interpolating salient factors.
Contrastive Analysis aims to separate factors that are common between two data distributions from those that are salient to only one of them. Existing contrastive methods are based on generative models (e.g., VAEs or GANs) that often suffer from limited reconstruction and image quality, which hampers effective latent factor separation and limits their applicability to high-fidelity image generation and edition. We propose a novel conditioning framework for diffusion models that enables contrastive decomposition without compromising generation quality. We first train a prompt-free, image-conditioned diffusion model, and then learn to decompose the conditioning into a common and a salient factor, using weak supervision. We prove that the additive contrastive factorization, commonly assumed in prior work, is identifiable under mild conditions. This factorization enables targeted operations by swapping or interpolating only the salient factor.