Out-of-focus: Learning Depth from Image Bokeh for Robotic Perception
This work addresses the problem of poor depth estimation in robotic perception by introducing a novel data collection approach, though it is incremental as it builds on existing CNN methods.
The paper tackles depth estimation from RGB images by using multiple images captured with varying camera focus, leveraging bokeh patterns to infer depth, and demonstrates improved performance over single-image methods on both standard and custom datasets.
In this project, we propose a novel approach for estimating depth from RGB images. Traditionally, most work uses a single RGB image to estimate depth, which is inherently difficult and generally results in poor performance, even with thousands of data examples. In this work, we alternatively use multiple RGB images that were captured while changing the focus of the camera's lens. This method leverages the natural depth information correlated to the different patterns of clarity/blur in the sequence of focal images, which helps distinguish objects at different depths. Since no such data set exists for learning this mapping, we collect our own data set using customized hardware. We then use a convolutional neural network for learning the depth from the stacked focal images. Comparative studies were conducted on both a standard RGBD data set and our own data set (learning from both single and multiple images), and results verified that stacked focal images yield better depth estimation than using just single RGB image.