CVOct 2, 2022
Unsupervised Visual Odometry and Action Integration for PointGoal Navigation in Indoor EnvironmentYijun Cao, Xianshi Zhang, Fuya Luo et al.
PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point. Recent studies solved this PointGoal navigation task with near-perfect success rate in photo-realistically simulated environments, under the assumptions with noiseless actuation and most importantly, perfect localization with GPS and compass sensors. However, accurate GPS signalis difficult to be obtained in real indoor environment. To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner. Sepecifically, unsupervised VO computes the relative pose of the agent from the re-projection error of two adjacent frames, and then replaces the accurate GPS signal with the path integration. The pseudo position estimated by VO is used to train action integration which assists agent to update their internal perception of location and helps improve the success rate of navigation. The training and inference process only use RGB, depth, collision as well as self-action information. The experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.
CVApr 29, 2021Code
Thermal Infrared Image Colorization for Nighttime Driving Scenes with Top-Down Guided AttentionFuya Luo, Yunhan Li, Guang Zeng et al.
Benefitting from insensitivity to light and high penetration of foggy environments, infrared cameras are widely used for sensing in nighttime traffic scenes. However, the low contrast and lack of chromaticity of thermal infrared (TIR) images hinder the human interpretation and portability of high-level computer vision algorithms. Colorization to translate a nighttime TIR image into a daytime color (NTIR2DC) image may be a promising way to facilitate nighttime scene perception. Despite recent impressive advances in image translation, semantic encoding entanglement and geometric distortion in the NTIR2DC task remain under-addressed. Hence, we propose a toP-down attEntion And gRadient aLignment based GAN, referred to as PearlGAN. A top-down guided attention module and an elaborate attentional loss are first designed to reduce the semantic encoding ambiguity during translation. Then, a structured gradient alignment loss is introduced to encourage edge consistency between the translated and input images. In addition, pixel-level annotation is carried out on a subset of FLIR and KAIST datasets to evaluate the semantic preservation performance of multiple translation methods. Furthermore, a new metric is devised to evaluate the geometric consistency in the translation process. Extensive experiments demonstrate the superiority of the proposed PearlGAN over other image translation methods for the NTIR2DC task. The source code and labeled segmentation masks will be available at \url{https://github.com/FuyaLuo/PearlGAN/}.
CVJun 5, 2025
Toward Better SSIM Loss for Unsupervised Monocular Depth EstimationYijun Cao, Fuya Luo, Yongjie Li
Unsupervised monocular depth learning generally relies on the photometric relation among temporally adjacent images. Most of previous works use both mean absolute error (MAE) and structure similarity index measure (SSIM) with conventional form as training loss. However, they ignore the effect of different components in the SSIM function and the corresponding hyperparameters on the training. To address these issues, this work proposes a new form of SSIM. Compared with original SSIM function, the proposed new form uses addition rather than multiplication to combine the luminance, contrast, and structural similarity related components in SSIM. The loss function constructed with this scheme helps result in smoother gradients and achieve higher performance on unsupervised depth estimation. We conduct extensive experiments to determine the relatively optimal combination of parameters for our new SSIM. Based on the popular MonoDepth approach, the optimized SSIM loss function can remarkably outperform the baseline on the KITTI-2015 outdoor dataset.
CVNov 22, 2021
Learning Generalized Visual Odometry Using Position-Aware Optical Flow and Geometric Bundle AdjustmentYijun Cao, Xianshi Zhang, Fuya Luo et al.
Recent visual odometry (VO) methods incorporating geometric algorithm into deep-learning architecture have shown outstanding performance on the challenging monocular VO task. Despite encouraging results are shown, previous methods ignore the requirement of generalization capability under noisy environment and various scenes. To address this challenging issue, this work first proposes a novel optical flow network (PANet). Compared with previous methods that predict optical flow as a direct regression task, our PANet computes optical flow by predicting it into the discrete position space with optical flow probability volume, and then converting it to optical flow. Next, we improve the bundle adjustment module to fit the self-supervised training pipeline by introducing multiple sampling, ego-motion initialization, dynamic damping factor adjustment, and Jacobi matrix weighting. In addition, a novel normalized photometric loss function is advanced to improve the depth estimation accuracy. The experiments show that the proposed system not only achieves comparable performance with other state-of-the-art self-supervised learning-based methods on the KITTI dataset, but also significantly improves the generalization capability compared with geometry-based, learning-based and hybrid VO systems on the noisy KITTI and the challenging outdoor (KAIST) scenes.