DreamTexture: Shape from Virtual Texture with Analysis by Augmentation
This work addresses the challenge of efficient 3D reconstruction for computer vision applications, offering a novel paradigm that reduces computational costs compared to prior methods like DreamFusion.
The paper tackles the problem of computationally expensive 3D reconstruction from virtual views by proposing DreamTexture, a method that uses monocular depth cues and virtual texture alignment to reconstruct 3D objects, achieving results that demonstrate generative models' inherent understanding of monocular shape cues.
DreamFusion established a new paradigm for unsupervised 3D reconstruction from virtual views by combining advances in generative models and differentiable rendering. However, the underlying multi-view rendering, along with supervision from large-scale generative models, is computationally expensive and under-constrained. We propose DreamTexture, a novel Shape-from-Virtual-Texture approach that leverages monocular depth cues to reconstruct 3D objects. Our method textures an input image by aligning a virtual texture with the real depth cues in the input, exploiting the inherent understanding of monocular geometry encoded in modern diffusion models. We then reconstruct depth from the virtual texture deformation with a new conformal map optimization, which alleviates memory-intensive volumetric representations. Our experiments reveal that generative models possess an understanding of monocular shape cues, which can be extracted by augmenting and aligning texture cues -- a novel monocular reconstruction paradigm that we call Analysis by Augmentation.