Robust Shape from Focus via Multiscale Directional Dilated Laplacian and Recurrent Network
This work addresses depth estimation for computer vision applications, presenting an incremental improvement by integrating traditional and deep learning techniques.
The paper tackled the problem of artifacts and noise in deep learning-based Shape-from-Focus depth estimation by proposing a hybrid framework that combines handcrafted Directional Dilated Laplacian kernels with a lightweight GRU-based network, achieving superior accuracy and generalization compared to state-of-the-art methods.
Shape-from-Focus (SFF) is a passive depth estimation technique that infers scene depth by analyzing focus variations in a focal stack. Most recent deep learning-based SFF methods typically operate in two stages: first, they extract focus volumes (a per pixel representation of focus likelihood across the focal stack) using heavy feature encoders; then, they estimate depth via a simple one-step aggregation technique that often introduces artifacts and amplifies noise in the depth map. To address these issues, we propose a hybrid framework. Our method computes multi-scale focus volumes traditionally using handcrafted Directional Dilated Laplacian (DDL) kernels, which capture long-range and directional focus variations to form robust focus volumes. These focus volumes are then fed into a lightweight, multi-scale GRU-based depth extraction module that iteratively refines an initial depth estimate at a lower resolution for computational efficiency. Finally, a learned convex upsampling module within our recurrent network reconstructs high-resolution depth maps while preserving fine scene details and sharp boundaries. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach outperforms state-of-the-art deep learning and traditional methods, achieving superior accuracy and generalization across diverse focal conditions.