A Perceptual Measure for Deep Single Image Camera Calibration
This addresses the need for automated camera calibration in natural images, enabling applications like virtual object insertion, but is incremental as it builds on deep learning approaches with a perceptual twist.
The paper tackles the problem of single image camera calibration in uncontrolled settings by proposing a deep convolutional neural network that directly infers parameters, outperforming other methods in L2 error and a new perceptual measure based on human judgments of realism.
Most current single image camera calibration methods rely on specific image features or user input, and cannot be applied to natural images captured in uncontrolled settings. We propose directly inferring camera calibration parameters from a single image using a deep convolutional neural network. This network is trained using automatically generated samples from a large-scale panorama dataset, and considerably outperforms other methods, including recent deep learning-based approaches, in terms of standard L2 error. However, we argue that in many cases it is more important to consider how humans perceive errors in camera estimation. To this end, we conduct a large-scale human perception study where we ask users to judge the realism of 3D objects composited with and without ground truth camera calibration. Based on this study, we develop a new perceptual measure for camera calibration, and demonstrate that our deep calibration network outperforms other methods on this measure. Finally, we demonstrate the use of our calibration network for a number of applications including virtual object insertion, image retrieval and compositing.