CV AIFeb 7, 2025

Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?

Yujin Han, Andi Han, Wei Huang, Chaochao Lu, Difan Zou

arXiv:2502.04725v119.714 citationsh-index: 7ICML

Originality Incremental advance

AI Analysis

This addresses a limitation in diffusion models for image generation, revealing their inability to learn subtle feature dependencies, which is incremental but important for improving model reliability.

The paper investigates whether diffusion models can learn hidden inter-feature rules in images, such as lighting-shadow relationships, and finds they consistently fail to capture fine-grained rules despite identifying coarse ones, with theoretical analysis showing denoising score matching is incompatible with rule conformity.

Despite the remarkable success of diffusion models (DMs) in data generation, they exhibit specific failure cases with unsatisfactory outputs. We focus on one such limitation: the ability of DMs to learn hidden rules between image features. Specifically, for image data with dependent features ($\mathbf{x}$) and ($\mathbf{y}$) (e.g., the height of the sun ($\mathbf{x}$) and the length of the shadow ($\mathbf{y}$)), we investigate whether DMs can accurately capture the inter-feature rule ($p(\mathbf{y}|\mathbf{x})$). Empirical evaluations on mainstream DMs (e.g., Stable Diffusion 3.5) reveal consistent failures, such as inconsistent lighting-shadow relationships and mismatched object-mirror reflections. Inspired by these findings, we design four synthetic tasks with strongly correlated features to assess DMs' rule-learning abilities. Extensive experiments show that while DMs can identify coarse-grained rules, they struggle with fine-grained ones. Our theoretical analysis demonstrates that DMs trained via denoising score matching (DSM) exhibit constant errors in learning hidden rules, as the DSM objective is not compatible with rule conformity. To mitigate this, we introduce a common technique - incorporating additional classifier guidance during sampling, which achieves (limited) improvements. Our analysis reveals that the subtle signals of fine-grained rules are challenging for the classifier to capture, providing insights for future exploration.

View on arXiv PDF

Similar