CVMar 26, 2024

Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos

Stanford
arXiv:2403.17915v420 citationsh-index: 6Has CodeECCV
Originality Incremental advance
AI Analysis

This work addresses depth estimation for assistive and robotic surgery, enabling better organ coverage and health issue detection, but it is incremental as it builds on existing methods with novel losses and refinement.

The paper tackles monocular depth estimation in endoscopy videos by using photometric cues from near-field lighting, achieving state-of-the-art results on the C3VD dataset and producing high-quality depth maps from clinical data.

Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues. Despite promising progress on mainstream, natural image depth estimation, techniques perform poorly on endoscopy images due to a lack of strong geometric features and challenging illumination effects. In this paper, we utilize the photometric cues, i.e., the light emitted from an endoscope and reflected by the surface, to improve monocular depth estimation. We first create two novel loss functions with supervised and self-supervised variants that utilize a per-pixel shading representation. We then propose a novel depth refinement network (PPSNet) that leverages the same per-pixel shading representation. Finally, we introduce teacher-student transfer learning to produce better depth maps from both synthetic data with supervision and clinical data with self-supervision. We achieve state-of-the-art results on the C3VD dataset while estimating high-quality depth maps from clinical data. Our code, pre-trained models, and supplementary materials can be found on our project page: https://ppsnet.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes