CVDec 3, 2021
Panoptic-aware Image-to-Image TranslationLiyun Zhang, Photchara Ratsamee, Bowen Wang et al.
Despite remarkable progress in image translation, the complex scene with multiple discrepant objects remains a challenging problem. The translated images have low fidelity and tiny objects in fewer details causing unsatisfactory performance in object recognition. Without thorough object perception (i.e., bounding boxes, categories, and masks) of images as prior knowledge, the style transformation of each object will be difficult to track in translation. We propose panoptic-aware generative adversarial networks (PanopticGAN) for image-to-image translation together with a compact panoptic segmentation dataset. The panoptic perception (i.e., foreground instances and background semantics of the image scene) is extracted to achieve alignment between object content codes of the input domain and panoptic-level style codes sampled from the target style space, then refined by a proposed feature masking module for sharping object boundaries. The image-level combination between content and sampled style codes is also merged for higher fidelity image generation. Our proposed method was systematically compared with different competing methods and obtained significant improvement in both image quality and object recognition performance.
CVMar 26, 2018
REST: Real-to-Synthetic Transform for Illumination Invariant Camera LocalizationSota Shoman, Tomohiro Mashita, Alexander Plopski et al.
Accurate camera localization is an essential part of tracking systems. However, localization results are greatly affected by illumination. Including data collected under various lighting conditions can improve the robustness of the localization algorithm to lighting variation. However, this is very tedious and time consuming. By using synthesized images it is possible to easily accumulate a large variety of views under varying illumination and weather conditions. Despite continuously improving processing power and rendering algorithms, synthesized images do not perfectly match real images of the same scene, i.e. there exists a gap between real and synthesized images that also affects the accuracy of camera localization. To reduce the impact of this gap, we introduce "REal-to-Synthetic Transform (REST)." REST is an autoencoder-like network that converts real features to their synthetic counterpart. The converted features can then be matched against the accumulated database for robust camera localization. In our experiments REST improved feature matching accuracy under variable lighting conditions by approximately 30%. Moreover, our system outperforms state of the art CNN-based camera localization methods trained with synthetic images. We believe our method could be used to initialize local tracking and to simplify data accumulation for lighting robust localization.