Unpaired Translation from Semantic Label Maps to Images by Leveraging Domain-Specific Simulations
This addresses the need for photorealistic image generation in applications such as medical training and virtual reality, where paired annotations are often unavailable, though it is incremental in improving unpaired translation methods.
The paper tackles the problem of generating photorealistic images from simulated label maps without paired data, using a contrastive learning framework that leverages domain-specific simulations to reduce artifacts and enable bidirectional translation, achieving realistic and scene-accurate results across datasets like laparoscopy and driving scenes.
Photorealistic image generation from simulated label maps are necessitated in several contexts, such as for medical training in virtual reality. With conventional deep learning methods, this task requires images that are paired with semantic annotations, which typically are unavailable. We introduce a contrastive learning framework for generating photorealistic images from simulated label maps, by learning from unpaired sets of both. Due to potentially large scene differences between real images and label maps, existing unpaired image translation methods lead to artifacts of scene modification in synthesized images. We utilize simulated images as surrogate targets for a contrastive loss, while ensuring consistency by utilizing features from a reverse translation network. Our method enables bidirectional label-image translations, which is demonstrated in a variety of scenarios and datasets, including laparoscopy, ultrasound, and driving scenes. By comparing with state-of-the-art unpaired translation methods, our proposed method is shown to generate realistic and scene-accurate translations.