LGAICVMay 19, 2025

Improving Compositional Generation with Diffusion Models Using Lift Scores

arXiv:2505.13740v21 citationsh-index: 6Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the challenge of generating images that accurately satisfy multiple conditions simultaneously, which is important for applications like text-to-image synthesis, though it appears incremental as it builds on existing diffusion models.

The paper tackles the problem of improving compositional generation in diffusion models by introducing a novel resampling criterion using lift scores, which significantly improved condition alignment across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis.

We introduce a novel resampling criterion using lift scores, for improving compositional generation in diffusion models. By leveraging the lift scores, we evaluate whether generated samples align with each single condition and then compose the results to determine whether the composed prompt is satisfied. Our key insight is that lift scores can be efficiently approximated using only the original diffusion model, requiring no additional training or external modules. We develop an optimized variant that achieves relatively lower computational overhead during inference while maintaining effectiveness. Through extensive experiments, we demonstrate that lift scores significantly improved the condition alignment for compositional generation across 2D synthetic data, CLEVR position tasks, and text-to-image synthesis. Our code is available at http://rainorangelemon.github.io/complift.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes