VSDiffusion: Taming Ill-Posed Shadow Generation via Visibility-Constrained Diffusion
This work provides a significant improvement in generating geometrically consistent shadows for image composition, benefiting graphic designers and content creators.
This paper addresses the challenge of generating realistic cast shadows for inserted foreground objects in image composition, which is an ill-posed problem. The proposed VSDiffusion framework, using visibility priors, achieves new state-of-the-art results on the DESOBAv2 dataset across most evaluation metrics.
Generating realistic cast shadows for inserted foreground objects is a crucial yet challenging problem in image composition, where maintaining geometric consistency of shadow and object in complex scenes remains difficult due to the ill-posed nature of shadow formation. To address this issue, we propose VSDiffusion, a visibility-constrained two-stage framework designed to narrow the solution space by incorporating visibility priors. In Stage I, we predict a coarse shadow mask to localize plausible shadow generated regions. And in Stage II, conditional diffusion is performed guided by lighting and depth cues estimated from the composite to generate accurate shadows. In VSDiffusion, we inject visibility priors through two complementary pathways. First, a visibility control branch with shadow-gated cross attention that provides multi-scale structural guidance. Then, a learned soft prior map that reweights training loss in error-prone regions to enhance geometric correction. Additionally, we also introduce high-frequency guided enhancement module to sharpen boundaries and improve texture interaction with the background. Experiments on widely used public DESOBAv2 dataset demonstrated that our proposed VSDiffusion can generate accurate shadow, and establishes new SOTA results across most evaluation metrics.