Diffusion Counterfactual Generation with Semantic Abduction
This work addresses challenges in image editing for applications requiring high fidelity and identity preservation, representing an incremental advancement by adapting diffusion models to causal reasoning.
The paper tackles the problem of counterfactual image generation by introducing a diffusion-based framework that integrates semantic representations for causal control, achieving improved trade-offs between faithful causal editing and identity preservation.
Counterfactual image generation presents significant challenges, including preserving identity, maintaining perceptual quality, and ensuring faithfulness to an underlying causal model. While existing auto-encoding frameworks admit semantic latent spaces which can be manipulated for causal control, they struggle with scalability and fidelity. Advancements in diffusion models present opportunities for improving counterfactual image editing, having demonstrated state-of-the-art visual quality, human-aligned perception and representation learning capabilities. Here, we present a suite of diffusion-based causal mechanisms, introducing the notions of spatial, semantic and dynamic abduction. We propose a general framework that integrates semantic representations into diffusion models through the lens of Pearlian causality to edit images via a counterfactual reasoning process. To our knowledge, this is the first work to consider high-level semantic identity preservation for diffusion counterfactuals and to demonstrate how semantic control enables principled trade-offs between faithful causal control and identity preservation.