CVApr 23, 2025

PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation

arXiv:2504.17067v12 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate depth estimation for endoscopy navigation and diagnostics in clinical settings, though it is incremental as it builds on existing translation and diffusion models.

The paper tackled the problem of domain gap in colonoscopy depth estimation by proposing a controllable sim-to-real translation framework that integrates Stable Diffusion with ControlNet, conditioned on Per-Pixel Shading maps, resulting in more realistic translations and improved depth estimation over GAN-based methods like MI-CycleGAN.

Accurate depth estimation enhances endoscopy navigation and diagnostics, but obtaining ground-truth depth in clinical settings is challenging. Synthetic datasets are often used for training, yet the domain gap limits generalization to real data. We propose a novel image-to-image translation framework that preserves structure while generating realistic textures from clinical data. Our key innovation integrates Stable Diffusion with ControlNet, conditioned on a latent representation extracted from a Per-Pixel Shading (PPS) map. PPS captures surface lighting effects, providing a stronger structural constraint than depth maps. Experiments show our approach produces more realistic translations and improves depth estimation over GAN-based MI-CycleGAN. Our code is publicly accessible at https://github.com/anaxqx/PPS-Ctrl.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes