CVGRLGJun 20, 2023

Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision

arXiv:2306.11719v2120 citationsh-index: 137
Originality Highly original
AI Analysis

This addresses the limitation of diffusion models requiring direct training samples, which is crucial for applications like inverse graphics where ground-truth data is unavailable.

The paper tackles the problem of generating signals from distributions that are never directly observed, using only partial observations through a known differentiable forward model, and demonstrates its effectiveness on challenging computer vision tasks like inverse graphics, enabling direct sampling of 3D scenes from a single 2D image.

Denoising diffusion models are a powerful type of generative models used to capture complex distributions of real-world signals. However, their applicability is limited to scenarios where training samples are readily available, which is not always the case in real-world applications. For example, in inverse graphics, the goal is to generate samples from a distribution of 3D scenes that align with a given image, but ground-truth 3D scenes are unavailable and only 2D images are accessible. To address this limitation, we propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed. Instead, these signals are measured indirectly through a known differentiable forward model, which produces partial observations of the unknown signal. Our approach involves integrating the forward model directly into the denoising process. This integration effectively connects the generative modeling of observations with the generative modeling of the underlying signals, allowing for end-to-end training of a conditional generative model over signals. During inference, our approach enables sampling from the distribution of underlying signals that are consistent with a given partial observation. We demonstrate the effectiveness of our method on three challenging computer vision tasks. For instance, in the context of inverse graphics, our model enables direct sampling from the distribution of 3D scenes that align with a single 2D input image.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes