Bridging Unsupervised and Supervised Depth from Focus via All-in-Focus Supervision
This addresses depth estimation for real-world applications where all-in-focus images are rare, offering a flexible training approach.
The paper tackles depth and all-in-focus image estimation from focal stacks, proposing a shared architecture that enables both supervised and unsupervised training. It reports outperforming state-of-the-art methods in accuracy and efficiency, with faster inference times.
Depth estimation is a long-lasting yet important task in computer vision. Most of the previous works try to estimate depth from input images and assume images are all-in-focus (AiF), which is less common in real-world applications. On the other hand, a few works take defocus blur into account and consider it as another cue for depth estimation. In this paper, we propose a method to estimate not only a depth map but an AiF image from a set of images with different focus positions (known as a focal stack). We design a shared architecture to exploit the relationship between depth and AiF estimation. As a result, the proposed method can be trained either supervisedly with ground truth depth, or \emph{unsupervisedly} with AiF images as supervisory signals. We show in various experiments that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and also has higher efficiency in inference time.