CVMay 5, 2025

Sim2Real in endoscopy segmentation with a novel structure aware image translation

Clara Tomasini, Luis Riazuelo, Ana C. Murillo

arXiv:2505.02654v16.21 citationsh-index: 3Has CodeSASHIMI@MICCAI

Originality Incremental advance

AI Analysis

This addresses the tedious annotation problem for doctors and surgeons in endoscopy by enabling effective training without real labeled data, though it is incremental as it builds on existing generative approaches.

The paper tackles the problem of training segmentation models for endoscopic images without real labeled data by developing a novel image translation model that adds realistic texture to synthetic images while preserving scene structure, demonstrating that models trained on these generated images can successfully perform fold segmentation in colonoscopy images with no real annotations.

Automatic segmentation of anatomical landmarks in endoscopic images can provide assistance to doctors and surgeons for diagnosis, treatments or medical training. However, obtaining the annotations required to train commonly used supervised learning methods is a tedious and difficult task, in particular for real images. While ground truth annotations are easier to obtain for synthetic data, models trained on such data often do not generalize well to real data. Generative approaches can add realistic texture to it, but face difficulties to maintain the structure of the original scene. The main contribution in this work is a novel image translation model that adds realistic texture to simulated endoscopic images while keeping the key scene layout information. Our approach produces realistic images in different endoscopy scenarios. We demonstrate these images can effectively be used to successfully train a model for a challenging end task without any real labeled data. In particular, we demonstrate our approach for the task of fold segmentation in colonoscopy images. Folds are key anatomical landmarks that can occlude parts of the colon mucosa and possible polyps. Our approach generates realistic images maintaining the shape and location of the original folds, after the image-style-translation, better than existing methods. We run experiments both on a novel simulated dataset for fold segmentation, and real data from the EndoMapper (EM) dataset. All our new generated data and new EM metadata is being released to facilitate further research, as no public benchmark is currently available for the task of fold segmentation.

View on arXiv PDF Code

Similar