LGMar 3, 2025

CoInD: Enabling Logical Compositions in Diffusion Models

arXiv:2503.01145v17 citationsh-index: 33ICLR
Originality Incremental advance
AI Analysis

This addresses a key limitation in generative modeling for controlled synthesis, particularly in scenarios with complex logical constraints, though it is incremental as it builds on existing diffusion model frameworks.

The paper tackles the problem of generating data with arbitrary logical compositions of attributes in diffusion models, showing that standard methods violate statistical independence assumptions, especially with limited training data. The proposed CoInD method enforces independence by minimizing Fisher's divergence, resulting in significantly more faithful and controlled generation, with pronounced benefits for NOT operations and partial compositions.

How can we learn generative models to sample data with arbitrary logical compositions of statistically independent attributes? The prevailing solution is to sample from distributions expressed as a composition of attributes' conditional marginal distributions under the assumption that they are statistically independent. This paper shows that standard conditional diffusion models violate this assumption, even when all attribute compositions are observed during training. And, this violation is significantly more severe when only a subset of the compositions is observed. We propose CoInD to address this problem. It explicitly enforces statistical independence between the conditional marginal distributions by minimizing Fisher's divergence between the joint and marginal distributions. The theoretical advantages of CoInD are reflected in both qualitative and quantitative experiments, demonstrating a significantly more faithful and controlled generation of samples for arbitrary logical compositions of attributes. The benefit is more pronounced for scenarios that current solutions relying on the assumption of conditionally independent marginals struggle with, namely, logical compositions involving the NOT operation and when only a subset of compositions are observed during training.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes