Structure-to-Image: Zero-Shot Depth Estimation in Colonoscopy via High-Fidelity Sim-to-Real Adaptation
This addresses the challenge of accurate depth estimation in colonoscopy for medical applications, representing an incremental improvement with a novel method for a known bottleneck.
The paper tackles the domain gap problem in monocular depth estimation for colonoscopy by proposing a Structure-to-Image paradigm that uses depth maps as a generative foundation, achieving a 44.18% reduction in RMSE compared to competing methods in zero-shot evaluations on a phantom dataset.
Monocular depth estimation (MDE) for colonoscopy is hampered by the domain gap between simulated and real-world images. Existing image-to-image translation methods, which use depth as a posterior constraint, often produce structural distortions and specular highlights by failing to balance realism with structure consistency. To address this, we propose a Structure-to-Image paradigm that transforms the depth map from a passive constraint into an active generative foundation. We are the first to introduce phase congruency to colonoscopic domain adaptation and design a cross-level structure constraint to co-optimize geometric structures and fine-grained details like vascular textures. In zero-shot evaluations conducted on a publicly available phantom dataset, the MDE model that was fine-tuned on our generated data achieved a maximum reduction of 44.18% in RMSE compared to competing methods. Our code is available at https://github.com/YyangJJuan/PC-S2I.git.