CVSep 23, 2025

Zero-shot Monocular Metric Depth for Endoscopic Images

arXiv:2509.18642v11 citationsh-index: 17Has CodeDEMI@MICCAI
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited data and benchmarks for depth estimation in endoscopic images, which is important for clinical applications, though it is incremental in providing resources rather than a new method.

This paper tackles the lack of robust benchmarks and high-quality datasets for monocular depth estimation in endoscopic images by presenting a comprehensive benchmark of state-of-the-art models on real endoscopic data and introducing a novel synthetic dataset (EndoSynth) with ground truth metric depth and segmentation masks. The result shows that fine-tuning depth foundation models with EndoSynth significantly boosts accuracy on most unseen real data.

Monocular relative and metric depth estimation has seen a tremendous boost in the last few years due to the sharp advancements in foundation models and in particular transformer based networks. As we start to see applications to the domain of endoscopic images, there is still a lack of robust benchmarks and high-quality datasets in that area. This paper addresses these limitations by presenting a comprehensive benchmark of state-of-the-art (metric and relative) depth estimation models evaluated on real, unseen endoscopic images, providing critical insights into their generalisation and performance in clinical scenarios. Additionally, we introduce and publish a novel synthetic dataset (EndoSynth) of endoscopic surgical instruments paired with ground truth metric depth and segmentation masks, designed to bridge the gap between synthetic and real-world data. We demonstrate that fine-tuning depth foundation models using our synthetic dataset boosts accuracy on most unseen real data by a significant margin. By providing both a benchmark and a synthetic dataset, this work advances the field of depth estimation for endoscopic images and serves as an important resource for future research. Project page, EndoSynth dataset and trained weights are available at https://github.com/TouchSurgery/EndoSynth.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes