CVMar 11, 2024

Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency

arXiv:2403.06683v214.115 citationsh-index: 7Has CodeMICCAI

Originality Incremental advance

AI Analysis

This work addresses depth estimation in surgical vision, an incremental improvement for medical imaging applications.

The paper tackles the problem of transferring relative monocular depth models from natural images to surgical endoscopy, where data is scarce, by using temporal consistency self-supervision to improve supervised training, resulting in significant performance gains over existing methods.

Relative monocular depth, inferring depth up to shift and scale from a single image, is an active research topic. Recent deep learning models, trained on large and varied meta-datasets, now provide excellent performance in the domain of natural images. However, few datasets exist which provide ground truth depth for endoscopic images, making training such models from scratch unfeasible. This work investigates the transfer of these models into the surgical domain, and presents an effective and simple way to improve on standard supervision through the use of temporal consistency self-supervision. We show temporal consistency significantly improves supervised training alone when transferring to the low-data regime of endoscopy, and outperforms the prevalent self-supervision technique for this task. In addition we show our method drastically outperforms the state-of-the-art method from within the domain of endoscopy. We also release our code, model and ensembled meta-dataset, Meta-MED, establishing a strong benchmark for future work.

View on arXiv PDF Code

Similar