CVJan 30, 2023Code
Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging LabelsDong-Guw Lee, Myung-Hwan Jeon, Younggun Cho et al.
The insufficient number of annotated thermal infrared (TIR) image datasets not only hinders TIR image-based deep learning networks to have comparable performances to that of RGB but it also limits the supervised learning of TIR image-based tasks with challenging labels. As a remedy, we propose a modified multidomain RGB to TIR image translation model focused on edge preservation to employ annotated RGB images with challenging labels. Our proposed method not only preserves key details in the original image but also leverages the optimal TIR style code to portray accurate TIR characteristics in the translated image, when applied on both synthetic and real world RGB images. Using our translation model, we have enabled the supervised learning of deep TIR image-based optical flow estimation and object detection that ameliorated in deep TIR optical flow estimation by reduction in end point error by 56.5\% on average and the best object detection mAP of 23.9\% respectively. Our code and supplementary materials are available at https://github.com/rpmsnu/sRGB-TIR.
ROApr 13, 2022
ViViD++: Vision for Visibility DatasetAlex Junho Lee, Younggun Cho, Young-sik Shin et al.
In this paper, we present a dataset capturing diverse visual data formats that target varying luminance conditions. While RGB cameras provide nourishing and intuitive information, changes in lighting conditions potentially result in catastrophic failure for robotic applications based on vision sensors. Approaches overcoming illumination problems have included developing more robust algorithms or other types of visual sensors, such as thermal and event cameras. Despite the alternative sensors' potential, there still are few datasets with alternative vision sensors. Thus, we provided a dataset recorded from alternative vision sensors, by handheld or mounted on a car, repeatedly in the same space but in different conditions. We aim to acquire visible information from co-aligned alternative vision sensors. Our sensor system collects data more independently from visible light intensity by measuring the amount of infrared dissipation, depth by structured reflection, and instantaneous temporal changes in luminance. We provide these measurements along with inertial sensors and ground-truth for developing robust visual SLAM under poor illumination. The full dataset is available at: https://visibilitydataset.github.io/
ROOct 24, 2024Code
Thermal Chameleon: Task-Adaptive Tone-mapping for Radiometric Thermal-Infrared imagesDong-Guw Lee, Jeongyun Kim, Younggun Cho et al.
Thermal Infrared (TIR) imaging provides robust perception for navigating in challenging outdoor environments but faces issues with poor texture and low image contrast due to its 14/16-bit format. Conventional methods utilize various tone-mapping methods to enhance contrast and photometric consistency of TIR images, however, the choice of tone-mapping is largely dependent on knowing the task and temperature dependent priors to work well. In this paper, we present Thermal Chameleon Network (TCNet), a task-adaptive tone-mapping approach for RAW 14-bit TIR images. Given the same image, TCNet tone-maps different representations of TIR images tailored for each specific task, eliminating the heuristic image rescaling preprocessing and reliance on the extensive prior knowledge of the scene temperature or task-specific characteristics. TCNet exhibits improved generalization performance across object detection and monocular depth estimation, with minimal computational overhead and modular integration to existing architectures for various tasks. Project Page: https://github.com/donkeymouse/ThermalChameleon
RODec 3, 2025
MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global LocalizationGihyeon Lee, Jungwoo Lee, Juwon Kim et al.
Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic ambiguity intensifies object misclassification and increases the likelihood of incorrect associations, which in turn can cause significant errors in the estimated pose. Thus, in this letter, we propose a multi-label likelihood-based semantic graph matching framework for object-level global localization. The key idea is to exploit multi-label graph representations, rather than single-label alternatives, to capture and leverage the inherent semantic context of object observations. Based on these representations, our approach enhances semantic correspondence across graphs by combining the likelihood of each node with the maximum likelihood of its neighbors via context-aware likelihood propagation. For rigorous validation, data association and pose estimation performance are evaluated under both closed-set and open-set detection configurations. In addition, we demonstrate the scalability of our approach to large-vocabulary object categories in both real-world indoor scenes and synthetic environments.
ROMar 8
GSAT: Geometric Traversability Estimation using Self-supervised Learning with Anomaly Detection for Diverse TerrainsDongjin Cho, Miryeong Park, Juhui Lee et al.
Safe autonomous navigation requires reliable estimation of environmental traversability. Traditional methods have relied on semantic or geometry-based approaches with human-defined thresholds, but these methods often yield unreliable predictions due to the inherent subjectivity of human supervision. While self-supervised approaches enable robots to learn from their own experience, they still face a fundamental challenge: the positive-only learning problem. To address these limitations, recent studies have employed Positive-Unlabeled (PU) learning, where the core challenge is identifying positive samples without explicit negative supervision. In this work, we propose GSAT, which addresses these limitations by constructing a positive hypersphere in latent space to classify traversable regions through anomaly detection without requiring additional prototypes (e.g., unlabeled or negative). Furthermore, our approach employs joint learning of anomaly classification and traversability prediction to more efficiently utilize robot experience. We comprehensively evaluate the proposed framework through ablation studies, validation on heterogeneous real-world robotic platforms, and autonomous navigation demonstrations in simulation environments.
ROMar 6
KISS-IMU: Self-supervised Inertial Odometry with Motion-balanced Learning and Uncertainty-aware InferenceJiwon Choi, Hogyun Kim, Geonmo Yang et al.
Inertial measurement units (IMUs), which provide high-frequency linear acceleration and angular velocity measurements, serve as fundamental sensing modalities in robotic systems. Recent advances in deep neural networks have led to remarkable progress in inertial odometry. However, the heavy reliance on ground truth data during training fundamentally limits scalability and generalization to unseen and diverse environments. We propose KISS-IMU, a novel self-supervised inertial odometry framework that eliminates ground truth dependency by leveraging simple LiDAR-based ICP registration and pose graph optimization as a supervisory signal. Our approach embodies two key principles: keeping the IMU stable through motion-aware balanced training and keeping the IMU strong through uncertainty-driven adaptive weighting during inference. To evaluate performance across diverse motion patterns and scenarios, we conducted comprehensive experiments on various real-world platforms, including quadruped robots. Importantly, we train only the IMU network in a self-supervised manner, with LiDAR serving solely as a lightweight supervisory signal rather than requiring additional learnable processes. This design enables the framework to ensure robustness without relying on joint multi-modal learning or ground truth supervision. The supplementary materials are available at https://sparolab.github.io/research/kiss_imu.
ROFeb 27, 2019
DeepLO: Geometry-Aware Deep LiDAR OdometryYounggun Cho, Giseop Kim, Ayoung Kim
Recently, learning-based ego-motion estimation approaches have drawn strong interest from studies mostly focusing on visual perception. These groundbreaking works focus on unsupervised learning for odometry estimation but mostly for visual sensors. Compared to images, a learning-based approach using Light Detection and Ranging (LiDAR) has been reported in a few studies where, most often, a supervised learning framework is proposed. In this paper, we propose a novel approach to geometry-aware deep LiDAR odometry trainable via both supervised and unsupervised frameworks. We incorporate the Iterated Closest Point (ICP) algorithm into a deep-learning framework and show the reliability of the proposed pipeline. We provide two loss functions that allow switching between supervised and unsupervised learning depending on the ground-truth validity in the training phase. An evaluation using the KITTI and Oxford RobotCar dataset demonstrates the prominent performance and efficiency of the proposed method when achieving pose accuracy.
CVJul 21, 2018
Generic Camera Attribute Control using Bayesian OptimizationJoowan Kim, Younggun Cho, Ayoung Kim
Cameras are the most widely exploited sensor in both robotics and computer vision communities. Despite their popularity, two dominant attributes (i.e., gain and exposure time) have been determined empirically and images are captured in very passive manner. In this paper, we present an active and generic camera attribute control scheme using Bayesian optimization. We extend from our previous work [1] in two aspects. First, we propose a method that jointly controls camera gain and exposure time. Secondly, to speed up the Bayesian optimization process, we introduce image synthesis using the camera response function (CRF). These synthesized images allowed us to diminish the image acquisition time during the Bayesian optimization phase, substantially improving overall control performance. The proposed method is validated both in an indoor and an outdoor environment where light condition rapidly changes. Supplementary material is available at https://youtu.be/XTYR_Mih3OU .
ROMar 16, 2018
Complex Urban LiDAR Data SetJinyong Jeong, Younggun Cho, Young-Sik Shin et al.
This paper presents a Light Detection and Ranging (LiDAR) data set that targets complex urban environments. Urban environments with high-rise buildings and congested traffic pose a significant challenge for many robotics applications. The presented data set is unique in the sense it is able to capture the genuine features of an urban environment (e.g. metropolitan areas, large building complexes and underground parking lots). Data of two-dimensional (2D) and threedimensional (3D) LiDAR, which are typical types of LiDAR sensors, are provided in the data set. The two 16-ray 3D LiDARs are tilted on both sides for maximal coverage. One 2D LiDAR faces backward while the other faces forwards to collect data of roads and buildings, respectively. Raw sensor data from Fiber Optic Gyro (FOG), Inertial Measurement Unit (IMU), and the Global Positioning System (GPS) are presented in a file format for vehicle pose estimation. The pose information of the vehicle estimated at 100 Hz is also presented after applying the graph simultaneous localization and mapping (SLAM) algorithm. For the convenience of development, the file player and data viewer in Robot Operating System (ROS) environment were also released via the web page. The full data sets are available at: http://irap.kaist.ac.kr/dataset. In this website, 3D preview of each data set is provided using WebGL.