CVMar 27

Zero-Shot Depth from Defocus

Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng

arXiv:2603.2665885.6h-index: 14Has Code

Predicted impact top 21% in CV · last 90 daysOriginality Highly original

AI Analysis

This work addresses the practical challenge of overfitting in DfD for computer vision applications, though it is incremental with novel architectural and data pipeline improvements.

The paper tackles the problem of zero-shot generalization in Depth from Defocus (DfD) by introducing a new real-world benchmark ZEDD and a Transformer-based network FOSSA, resulting in up to 55.7% error reduction compared to baselines.

Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot generalization. We first propose a new real-world DfD benchmark ZEDD, which contains 8.3x more scenes and significantly higher quality images and ground-truth depth maps compared to previous benchmarks. We also design a novel network architecture named FOSSA. FOSSA is a Transformer-based architecture with novel designs tailored to the DfD task. The key contribution is a stack attention layer with a focus distance embedding, allowing efficient information exchange across the focus stack. Finally, we develop a new training data pipeline allowing us to utilize existing large-scale RGBD datasets to generate synthetic focus stacks. Experiment results on ZEDD and other benchmarks show a significant improvement over the baselines, reducing errors by up to 55.7%. The ZEDD benchmark is released at https://zedd.cs.princeton.edu. The code and checkpoints are released at https://github.com/princeton-vl/FOSSA.

View on arXiv PDF Code

Similar