CVAILGSep 16, 2025

ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors

arXiv:2509.13525v14 citationsh-index: 38Pac Symp Biocomput Pac Symp Biocomput
Originality Incremental advance
AI Analysis

This work addresses the challenge of 3D scene understanding in colonoscopy for medical applications, though it is incremental as it builds on existing diffusion methods for a specific domain.

The paper tackled the problem of temporal inconsistency in depth estimation for colonoscopy videos by introducing ColonCrafter, a diffusion-based model that uses synthetic data and style transfer to achieve state-of-the-art zero-shot performance on the C3VD dataset.

Three-dimensional (3D) scene understanding in colonoscopy presents significant challenges that necessitate automated methods for accurate depth estimation. However, existing depth estimation models for endoscopy struggle with temporal consistency across video sequences, limiting their applicability for 3D reconstruction. We present ColonCrafter, a diffusion-based depth estimation model that generates temporally consistent depth maps from monocular colonoscopy videos. Our approach learns robust geometric priors from synthetic colonoscopy sequences to generate temporally consistent depth maps. We also introduce a style transfer technique that preserves geometric structure while adapting real clinical videos to match our synthetic training domain. ColonCrafter achieves state-of-the-art zero-shot performance on the C3VD dataset, outperforming both general-purpose and endoscopy-specific approaches. Although full trajectory 3D reconstruction remains a challenge, we demonstrate clinically relevant applications of ColonCrafter, including 3D point cloud generation and surface coverage assessment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes