Constrained Diffusion with Trust Sampling
This addresses the limitation of diffusion models in constrained generation for applications like image and 3D motion synthesis, representing a novel method for a known bottleneck.
The paper tackles the problem of diffusion models struggling to satisfy challenging constraints by introducing a training-free method that formulates constrained optimizations during inference, using trust sampling to balance between unconditional diffusion and loss guidance. The result is significant improvements in generation quality across images and 3D motion generation, outperforming existing methods.
Diffusion models have demonstrated significant promise in various generative tasks; however, they often struggle to satisfy challenging constraints. Our approach addresses this limitation by rethinking training-free loss-guided diffusion from an optimization perspective. We formulate a series of constrained optimizations throughout the inference process of a diffusion model. In each optimization, we allow the sample to take multiple steps along the gradient of the proxy constraint function until we can no longer trust the proxy, according to the variance at each diffusion level. Additionally, we estimate the state manifold of diffusion model to allow for early termination when the sample starts to wander away from the state manifold at each diffusion step. Trust sampling effectively balances between following the unconditional diffusion model and adhering to the loss guidance, enabling more flexible and accurate constrained generation. We demonstrate the efficacy of our method through extensive experiments on complex tasks, and in drastically different domains of images and 3D motion generation, showing significant improvements over existing methods in terms of generation quality. Our implementation is available at https://github.com/will-s-h/trust-sampling.