Exploring Semantic Consistency in Unpaired Image Translation to Generate Data for Surgical Applications
This addresses the challenge of data scarcity in surgical applications due to privacy and annotation issues, but it is incremental as it builds on existing translation methods.
This study tackled the problem of generating labeled training data for surgical computer vision by using unpaired image translation, focusing on preserving semantic consistency, and found that combining structural-similarity loss with contrastive learning yielded higher semantic consistency for effective training data.
In surgical computer vision applications, obtaining labeled training data is challenging due to data-privacy concerns and the need for expert annotation. Unpaired image-to-image translation techniques have been explored to automatically generate large annotated datasets by translating synthetic images to the realistic domain. However, preserving the structure and semantic consistency between the input and translated images presents significant challenges, mainly when there is a distributional mismatch in the semantic characteristics of the domains. This study empirically investigates unpaired image translation methods for generating suitable data in surgical applications, explicitly focusing on semantic consistency. We extensively evaluate various state-of-the-art image translation models on two challenging surgical datasets and downstream semantic segmentation tasks. We find that a simple combination of structural-similarity loss and contrastive learning yields the most promising results. Quantitatively, we show that the data generated with this approach yields higher semantic consistency and can be used more effectively as training data.The code is available at https://gitlab.com/nct_tso_public/constructs.