Single View Metrology in the Wild
This work addresses the scale ambiguity issue in 3D reconstruction for applications like virtual object insertion, representing a novel method for a known bottleneck.
The paper tackles the problem of recovering absolute scale and camera parameters from a single monocular image in unconstrained conditions, achieving state-of-the-art results on multiple datasets and validating perceptual quality through a user study.
Most 3D reconstruction methods may only recover scene properties up to a global scale ambiguity. We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights, through estimation of bounding box projections. We leverage categorical priors for objects such as humans or cars that commonly occur in natural images, as references for scale estimation. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion. Furthermore, the perceptual quality of our outputs is validated by a user study.