Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion
This addresses the problem of practical 3D reconstruction for applications like AR and robotics, offering an incremental improvement by integrating pose estimation into existing NeRF-GAN methods.
The paper tackles 3D reconstruction from a single image without ground-truth poses, introducing a framework that recovers shape, pose, and appearance via bootstrapped radiance field inversion, achieving state-of-the-art results and enabling de-rendering in as few as 10 steps.
Neural Radiance Fields (NeRF) coupled with GANs represent a promising direction in the area of 3D reconstruction from a single view, owing to their ability to efficiently model arbitrary topologies. Recent work in this area, however, has mostly focused on synthetic datasets where exact ground-truth poses are known, and has overlooked pose estimation, which is important for certain downstream applications such as augmented reality (AR) and robotics. We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. Our approach recovers an SDF-parameterized 3D shape, pose, and appearance from a single image of an object, without exploiting multiple views during training. More specifically, we leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution which is then refined via optimization. Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios. We demonstrate state-of-the-art results on a variety of real and synthetic benchmarks.