CVDec 3, 2021
Panoptic-aware Image-to-Image TranslationLiyun Zhang, Photchara Ratsamee, Bowen Wang et al.
Despite remarkable progress in image translation, the complex scene with multiple discrepant objects remains a challenging problem. The translated images have low fidelity and tiny objects in fewer details causing unsatisfactory performance in object recognition. Without thorough object perception (i.e., bounding boxes, categories, and masks) of images as prior knowledge, the style transformation of each object will be difficult to track in translation. We propose panoptic-aware generative adversarial networks (PanopticGAN) for image-to-image translation together with a compact panoptic segmentation dataset. The panoptic perception (i.e., foreground instances and background semantics of the image scene) is extracted to achieve alignment between object content codes of the input domain and panoptic-level style codes sampled from the target style space, then refined by a proposed feature masking module for sharping object boundaries. The image-level combination between content and sampled style codes is also merged for higher fidelity image generation. Our proposed method was systematically compared with different competing methods and obtained significant improvement in both image quality and object recognition performance.
CVOct 26, 2021
Transferring Domain-Agnostic Knowledge in Video Question AnsweringTianran Wu, Noa Garcia, Mayu Otani et al.
Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip. The current available large-scale datasets have made it possible to formulate VideoQA as the joint understanding of visual and language information. However, this training procedure is costly and still less competent with human performance. In this paper, we investigate a transfer learning method by the introduction of domain-agnostic knowledge and domain-specific knowledge. First, we develop a novel transfer learning framework, which finetunes the pre-trained model by applying domain-agnostic knowledge as the medium. Second, we construct a new VideoQA dataset with 21,412 human-generated question-answer samples for comparable transfer of knowledge. Our experiments show that: (i) domain-agnostic knowledge is transferable and (ii) our proposed transfer learning framework can boost VideoQA performance effectively.
CVMar 26, 2018
REST: Real-to-Synthetic Transform for Illumination Invariant Camera LocalizationSota Shoman, Tomohiro Mashita, Alexander Plopski et al.
Accurate camera localization is an essential part of tracking systems. However, localization results are greatly affected by illumination. Including data collected under various lighting conditions can improve the robustness of the localization algorithm to lighting variation. However, this is very tedious and time consuming. By using synthesized images it is possible to easily accumulate a large variety of views under varying illumination and weather conditions. Despite continuously improving processing power and rendering algorithms, synthesized images do not perfectly match real images of the same scene, i.e. there exists a gap between real and synthesized images that also affects the accuracy of camera localization. To reduce the impact of this gap, we introduce "REal-to-Synthetic Transform (REST)." REST is an autoencoder-like network that converts real features to their synthetic counterpart. The converted features can then be matched against the accumulated database for robust camera localization. In our experiments REST improved feature matching accuracy under variable lighting conditions by approximately 30%. Moreover, our system outperforms state of the art CNN-based camera localization methods trained with synthetic images. We believe our method could be used to initialize local tracking and to simplify data accumulation for lighting robust localization.
HCJan 16, 2018
Plane-Casting: 3D Cursor Control with a SmartPhoneNicholas Katzakis, Kiyoshi Kiyokawa, Masahiro Hori et al.
We present Plane-Casting, a novel technique for 3D object manipulation from a distance that is especially suitable for smartphones. We describe two variations of Plane-Casting, Pivot and Free Plane-Casting, and present results from a pilot study. Results suggest that Pivot Plane-Casting is more suitable for quick, coarse movements whereas Free Plane-Casting is more suited to slower, precise motion. In a 3D movement task, Pivot Plane-Casting performed better quantitatively, but subjects preferred Free Plane-Casting overall.