Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy
This work addresses depth estimation in medical imaging for endoscopy, offering a practical solution without requiring extensive manual input, though it appears incremental as it builds on existing self-supervised and multi-view stereo methods.
The paper tackles the problem of dense depth estimation from monocular endoscopy data by proposing a self-supervised approach that eliminates the need for manual labeling or patient CT scans, achieving submillimeter residual errors in validation.
We present a self-supervised approach to training convolutional neural networks for dense depth estimation from monocular endoscopy data without a priori modeling of anatomy or shading. Our method only requires sequential data from monocular endoscopic videos and a multi-view stereo reconstruction method, e.g. structure from motion, that supervises learning in a sparse but accurate manner. Consequently, our method requires neither manual interaction, such as scaling or labeling, nor patient CT in the training and application phases. We demonstrate the performance of our method on sinus endoscopy data from two patients and validate depth prediction quantitatively using corresponding patient CT scans where we found submillimeter residual errors.