GECCO: Geometrically-Conditioned Point Diffusion Models
This addresses the problem of generating high-fidelity and geometrically consistent point clouds from images for applications in computer vision and 3D modeling, representing an incremental improvement over current diffusion-based methods.
The paper tackles generating point clouds conditionally on images by introducing a geometrically-motivated conditioning scheme that projects sparse image features into the point cloud during denoising, improving geometric consistency and fidelity over existing methods. It achieves state-of-the-art or better performance on synthetic data, with faster and lighter models, and scales to diverse indoor scenes.
Diffusion models generating images conditionally on text, such as Dall-E 2 and Stable Diffusion, have recently made a splash far beyond the computer vision community. Here, we tackle the related problem of generating point clouds, both unconditionally, and conditionally with images. For the latter, we introduce a novel geometrically-motivated conditioning scheme based on projecting sparse image features into the point cloud and attaching them to each individual point, at every step in the denoising process. This approach improves geometric consistency and yields greater fidelity than current methods relying on unstructured, global latent codes. Additionally, we show how to apply recent continuous-time diffusion schemes. Our method performs on par or above the state of art on conditional and unconditional experiments on synthetic data, while being faster, lighter, and delivering tractable likelihoods. We show it can also scale to diverse indoors scenes.