Reconstruct, Rasterize and Backprop: Dense shape and pose estimation from a single image
This addresses the need for accurate object shape and pose estimation from monocular images for downstream robotics applications, representing a novel integration rather than an incremental improvement.
The paper tackles the problem of estimating both dense 3D object shapes and 6-DoF poses from a single image, which prior methods failed to do by only recovering shapes in canonical frames. The result is significantly lower pose estimation errors compared to prior art, enabling reconstruction in camera frames suitable for robotics tasks.
This paper presents a new system to obtain dense object reconstructions along with 6-DoF poses from a single image. Geared towards high fidelity reconstruction, several recent approaches leverage implicit surface representations and deep neural networks to estimate a 3D mesh of an object, given a single image. However, all such approaches recover only the shape of an object; the reconstruction is often in a canonical frame, unsuitable for downstream robotics tasks. To this end, we leverage recent advances in differentiable rendering (in particular, rasterization) to close the loop with 3D reconstruction in camera frame. We demonstrate that our approach---dubbed reconstruct, rasterize and backprop (RRB) achieves significantly lower pose estimation errors compared to prior art, and is able to recover dense object shapes and poses from imagery. We further extend our results to an (offline) setup, where we demonstrate a dense monocular object-centric egomotion estimation system.