Hyun Myung

RO
h-index19
47papers
1,407citations
Novelty49%
AI Score60

47 Papers

ROJul 25, 2022Code
Patchwork++: Fast and Robust Ground Segmentation Solving Partial Under-Segmentation Using 3D Point Cloud

Seungjae Lee, Hyungtae Lim, Hyun Myung

In the field of 3D perception using 3D LiDAR sensors, ground segmentation is an essential task for various purposes, such as traversable area detection and object recognition. Under these circumstances, several ground segmentation methods have been proposed. However, some limitations are still encountered. First, some ground segmentation methods require fine-tuning of parameters depending on the surroundings, which is excessively laborious and time-consuming. Moreover, even if the parameters are well adjusted, a partial under-segmentation problem can still emerge, which implies ground segmentation failures in some regions. Finally, ground segmentation methods typically fail to estimate an appropriate ground plane when the ground is above another structure, such as a retaining wall. To address these problems, we propose a robust ground segmentation method called Patchwork++, an extension of Patchwork. Patchwork++ exploits adaptive ground likelihood estimation (A-GLE) to calculate appropriate parameters adaptively based on the previous ground segmentation results. Moreover, temporal ground revert (TGR) alleviates a partial under-segmentation problem by using the temporary ground property. Also, region-wise vertical plane fitting (R-VPF) is introduced to segment the ground plane properly even if the ground is elevated with different layers. Finally, we present reflected noise removal (RNR) to eliminate virtual noise points efficiently based on the 3D LiDAR reflection model. We demonstrate the qualitative and quantitative evaluations using a SemanticKITTI dataset. Our code is available at https://github.com/url-kaist/patchwork-plusplus

CVMar 13, 2022Code
A Single Correspondence Is Enough: Robust Global Registration to Avoid Degeneracy in Urban Environments

Hyungtae Lim, Suyong Yeon, Soohyun Ryu et al.

Global registration using 3D point clouds is a crucial technology for mobile platforms to achieve localization or manage loop-closing situations. In recent years, numerous researchers have proposed global registration methods to address a large number of outlier correspondences. Unfortunately, the degeneracy problem, which represents the phenomenon in which the number of estimated inliers becomes lower than three, is still potentially inevitable. To tackle the problem, a degeneracy-robust decoupling-based global registration method is proposed, called Quatro. In particular, our method employs quasi-SO(3) estimation by leveraging the Atlanta world assumption in urban environments to avoid degeneracy in rotation estimation. Thus, the minimum degree of freedom (DoF) of our method is reduced from three to one. As verified in indoor and outdoor 3D LiDAR datasets, our proposed method yields robust global registration performance compared with other global registration methods, even for distant point cloud pairs. Furthermore, the experimental results confirm the applicability of our method as a coarse alignment. Our code is available: https://github.com/url-kaist/quatro.

CVSep 23, 2024Code
KISS-Matcher: Fast and Robust Point Cloud Registration Revisited

Hyungtae Lim, Daebeom Kim, Gunhee Shin et al.

While global point cloud registration systems have advanced significantly in all aspects, many studies have focused on specific components, such as feature extraction, graph-theoretic pruning, or pose solvers. In this paper, we take a holistic view on the registration problem and develop an open-source and versatile C++ library for point cloud registration, called KISS-Matcher. KISS-Matcher combines a novel feature detector, Faster-PFH, that improves over the classical fast point feature histogram (FPFH). Moreover, it adopts a $k$-core-based graph-theoretic pruning to reduce the time complexity of rejecting outlier correspondences. Finally, it combines these modules in a complete, user-friendly, and ready-to-use pipeline. As verified by extensive experiments, KISS-Matcher has superior scalability and broad applicability, achieving a substantial speed-up compared to state-of-the-art outlier-robust registration pipelines while preserving accuracy. Our code will be available at https://github.com/MIT-SPARK/KISS-Matcher.

ROJun 1, 2022Code
PaGO-LOAM: Robust Ground-Optimized LiDAR Odometry

Dong-Uk Seo, Hyungtae Lim, Seungjae Lee et al.

Numerous researchers have conducted studies to achieve fast and robust ground-optimized LiDAR odometry methods for terrestrial mobile platforms. In particular, ground-optimized LiDAR odometry usually employs ground segmentation as a preprocessing method. This is because most of the points in a 3D point cloud captured by a 3D LiDAR sensor on a terrestrial platform are from the ground. However, the effect of the performance of ground segmentation on LiDAR odometry is still not closely examined. In this paper, a robust ground-optimized LiDAR odometry framework is proposed to facilitate the study to check the effect of ground segmentation on LiDAR SLAM based on the state-of-the-art (SOTA) method. By using our proposed odometry framework, it is easy and straightforward to test whether ground segmentation algorithms help extract well-described features and thus improve SLAM performance. In addition, by leveraging the SOTA ground segmentation method called Patchwork, which shows robust ground segmentation even in complex and uneven urban environments with little performance perturbation, a novel ground-optimized LiDAR odometry is proposed, called PaGO-LOAM. The methods were tested using the KITTI odometry dataset. \textit{PaGO-LOAM} shows robust and accurate performance compared with the baseline method. Our code is available at https://github.com/url-kaist/AlterGround-LeGO-LOAM.

ROApr 13, 2022
ViViD++: Vision for Visibility Dataset

Alex Junho Lee, Younggun Cho, Young-sik Shin et al.

In this paper, we present a dataset capturing diverse visual data formats that target varying luminance conditions. While RGB cameras provide nourishing and intuitive information, changes in lighting conditions potentially result in catastrophic failure for robotic applications based on vision sensors. Approaches overcoming illumination problems have included developing more robust algorithms or other types of visual sensors, such as thermal and event cameras. Despite the alternative sensors' potential, there still are few datasets with alternative vision sensors. Thus, we provided a dataset recorded from alternative vision sensors, by handheld or mounted on a car, repeatedly in the same space but in different conditions. We aim to acquire visible information from co-aligned alternative vision sensors. Our sensor system collects data more independently from visible light intensity by measuring the amount of infrared dissipation, depth by structured reflection, and instantaneous temporal changes in luminance. We provide these measurements along with inertial sensors and ground-truth for developing robust visual SLAM under poor illumination. The full dataset is available at: https://visibilitydataset.github.io/

CVAug 12, 2024
HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors

Hyungtae Lim, Seoyeon Jang, Benedikt Mersch et al.

Moving object segmentation (MOS) using a 3D light detection and ranging (LiDAR) sensor is crucial for scene understanding and identification of moving objects. Despite the availability of various types of 3D LiDAR sensors in the market, MOS research still predominantly focuses on 3D point clouds from mechanically spinning omnidirectional LiDAR sensors. Thus, we are, for example, lacking a dataset with MOS labels for point clouds from solid-state LiDAR sensors which have irregular scanning patterns. In this paper, we present a labeled dataset, called \textit{HeLiMOS}, that enables to test MOS approaches on four heterogeneous LiDAR sensors, including two solid-state LiDAR sensors. Furthermore, we introduce a novel automatic labeling method to substantially reduce the labeling effort required from human annotators. To this end, our framework exploits an instance-aware static map building approach and tracking-based false label filtering. Finally, we provide experimental results regarding the performance of commonly used state-of-the-art MOS approaches on HeLiMOS that suggest a new direction for a sensor-agnostic MOS, which generally works regardless of the type of LiDAR sensors used to capture 3D point clouds. Our dataset is available at https://sites.google.com/view/helimos.

40.9CVMay 22Code
LQ-rPPG: A Label-Quantized Coarse-to-Fine Learning Framework for Remote Physiological Measurement

Jun Seong Lee, Samyeul Noh, Changki Sung et al.

Remote photoplethysmography (rPPG) enables non-contact measurement of physiological signals from facial videos, offering strong potential for remote healthcare and daily health monitoring. Driven by this potential, various deep learning-based rPPG methods have been proposed to improve rPPG estimation. However, previous deep learning-based rPPG methods have paid little attention to the quality of training labels and their impact on model learning. Contact-based PPG signals used as training labels often contain noise and variability caused by motion artifacts, inconsistent sensor contact, and morphological distortions. Such label inconsistency can lead models to overfit to the label noise and variability and consequently degrade generalization performance. To address this issue, we propose LQ-rPPG, a label-quantized coarse-to-fine learning framework for robust rPPG estimation. LQ-rPPG consists of a label quantization module and a coarse-to-fine rPPG estimation model. The label quantization module transforms continuous PPG signals into multi-bit quantized pseudo labels with reduced noise and variability. The coarse-to-fine estimation model progressively refines rPPG signals under hierarchical supervision guided by the multi-bit pseudo labels. This design alleviates overfitting to label-specific variations and enables the model to learn structured and consistent representations. As a result, LQ-rPPG achieves robust and generalizable rPPG estimation even under challenging conditions. Experiments on multiple benchmark datasets demonstrate that LQ-rPPG achieves strong performance in both intra- and cross-dataset evaluations, while reducing parameters and multiply-accumulate operations by 88% and 29%, respectively, and increasing throughput by 191%. The code is available at https://github.com/Anonymous-repo-code/LQ-rPPG.

CVJul 19, 2022
eCDT: Event Clustering for Simultaneous Feature Detection and Tracking-

Sumin Hu, Yeeun Kim, Hyungtae Lim et al.

Contrary to other standard cameras, event cameras interpret the world in an entirely different manner; as a collection of asynchronous events. Despite event camera's unique data output, many event feature detection and tracking algorithms have shown significant progress by making detours to frame-based data representations. This paper questions the need to do so and proposes a novel event data-friendly method that achieve simultaneous feature detection and tracking, called event Clustering-based Detection and Tracking (eCDT). Our method employs a novel clustering method, named as k-NN Classifier-based Spatial Clustering and Applications with Noise (KCSCAN), to cluster adjacent polarity events to retrieve event trajectories.With the aid of a Head and Tail Descriptor Matching process, event clusters that reappear in a different polarity are continually tracked, elongating the feature tracks. Thanks to our clustering approach in spatio-temporal space, our method automatically solves feature detection and feature tracking simultaneously. Also, eCDT can extract feature tracks at any frequency with an adjustable time window, which does not corrupt the high temporal resolution of the original event data. Our method achieves 30% better feature tracking ages compared with the state-of-the-art approach while also having a low error approximately equal to it.

CVApr 29, 2022
Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural Regularities from Visual SLAM

Jinwoo Jeon, Hyunjun Lim, Dong-Uk Seo et al.

Feature-based visual simultaneous localization and mapping (SLAM) methods only estimate the depth of extracted features, generating a sparse depth map. To solve this sparsity problem, depth completion tasks that estimate a dense depth from a sparse depth have gained significant importance in robotic applications like exploration. Existing methodologies that use sparse depth from visual SLAM mainly employ point features. However, point features have limitations in preserving structural regularities owing to texture-less environments and sparsity problems. To deal with these issues, we perform depth completion with visual SLAM using line features, which can better contain structural regularities than point features. The proposed methodology creates a convex hull region by performing constrained Delaunay triangulation with depth interpolation using line features. However, the generated depth includes low-frequency information and is discontinuous at the convex hull boundary. Therefore, we propose a mesh depth refinement (MDR) module to address this problem. The MDR module effectively transfers the high-frequency details of an input image to the interpolated depth and plays a vital role in bridging the conventional and deep learning-based approaches. The Struct-MDC outperforms other state-of-the-art algorithms on public and our custom datasets, and even outperforms supervised methodologies for some metrics. In addition, the effectiveness of the proposed MDR module is verified by a rigorous ablation study.

63.7ROApr 17Code
GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow

Dong-Uk Seo, Jinwoo Jeon, Eungchang Mason Lee et al.

Gaussian splatting has recently gained traction as a compelling map representation for SLAM systems, enabling dense and photo-realistic scene modeling. However, its application to monocular SLAM remains challenging due to the lack of reliable geometric cues from monocular input. Without geometric supervision, mapping or tracking could fall in local-minima, resulting in structural degeneracies and inaccuracies. To address this challenge, we propose GaussianFlow SLAM, a monocular 3DGS-SLAM that leverages optical flow as a geometry-aware cue to guide the optimization of both the scene structure and camera poses. By encouraging the projected motion of Gaussians, termed GaussianFlow, to align with the optical flow, our method introduces consistent structural cues to regularize both map reconstruction and pose estimation. Furthermore, we introduce normalized error-based densification and pruning modules to refine inactive and unstable Gaussians, thereby contributing to improved map quality and pose accuracy. Experiments conducted on public datasets demonstrate that our method achieves superior rendering quality and tracking accuracy compared with state-of-the-art algorithms. The source code is available at: https://github.com/url-kaist/gaussianflow-slam.

ROApr 17, 2023
(LC)$^2$: LiDAR-Camera Loop Constraints For Cross-Modal Place Recognition

Alex Junho Lee, Seungwon Song, Hyungtae Lim et al.

Localization has been a challenging task for autonomous navigation. A loop detection algorithm must overcome environmental changes for the place recognition and re-localization of robots. Therefore, deep learning has been extensively studied for the consistent transformation of measurements into localization descriptors. Street view images are easily accessible; however, images are vulnerable to appearance changes. LiDAR can robustly provide precise structural information. However, constructing a point cloud database is expensive, and point clouds exist only in limited places. Different from previous works that train networks to produce shared embedding directly between the 2D image and 3D point cloud, we transform both data into 2.5D depth images for matching. In this work, we propose a novel cross-matching method, called (LC)$^2$, for achieving LiDAR localization without a prior point cloud map. To this end, LiDAR measurements are expressed in the form of range images before matching them to reduce the modality discrepancy. Subsequently, the network is trained to extract localization descriptors from disparity and range images. Next, the best matches are employed as a loop factor in a pose graph. Using public datasets that include multiple sessions in significantly different lighting conditions, we demonstrated that LiDAR-based navigation systems could be optimized from image databases and vice versa.

67.1ROMar 28Code
AIM-SLAM: Dense Monocular SLAM via Adaptive and Informative Multi-View Keyframe Prioritization with Foundation Model

Jinwoo Jeon, Dong-Uk Seo, Eungchang Mason Lee et al.

Recent advances in geometric foundation models have emerged as a promising alternative for addressing the challenge of dense reconstruction in monocular visual simultaneous localization and mapping (SLAM). Although geometric foundation models enable SLAM to leverage variable input views, the previous methods remain confined to two-view pairs or fixed-length inputs without sufficient deliberation of geometric context for view selection. To tackle this problem, we propose AIM-SLAM, a dense monocular SLAM framework that exploits an adaptive and informative multi-view keyframe prioritization with dense pointmap predictions from visual geometry grounded transformer (VGGT). Specifically, we introduce the selective information- and geometric-aware multi-view adaptation (SIGMA) module, which employs voxel overlap and information gain to retrieve a candidate set of keyframes and adaptively determine its size. Furthermore, we formulate a joint multi-view Sim(3) optimization that enforces consistent alignment across selected views, substantially improving pose estimation accuracy. The effectiveness of AIM-SLAM is demonstrated on real-world datasets, where it achieves state-of-the-art pose estimation performance and accurate dense reconstruction results. Our system supports ROS integration, with code is available at https://aimslam.github.io/.

CVDec 19, 2023Code
Object-Aware Domain Generalization for Object Detection

Wooju Lee, Dasol Hong, Hyungtae Lim et al.

Single-domain generalization (S-DG) aims to generalize a model to unseen environments with a single-source domain. However, most S-DG approaches have been conducted in the field of classification. When these approaches are applied to object detection, the semantic features of some objects can be damaged, which can lead to imprecise object localization and misclassification. To address these problems, we propose an object-aware domain generalization (OA-DG) method for single-domain generalization in object detection. Our method consists of data augmentation and training strategy, which are called OA-Mix and OA-Loss, respectively. OA-Mix generates multi-domain data with multi-level transformation and object-aware mixing strategy. OA-Loss enables models to learn domain-invariant representations for objects and backgrounds from the original and OA-Mixed images. Our proposed method outperforms state-of-the-art works on standard benchmarks. Our code is available at https://github.com/WoojuLee24/OA-DG.

53.4ROMay 19
CLUE: Adaptively Prioritized Contextual Cues by Leveraging a Unified Semantic Map for Effective Zero-Shot Object-Goal Navigation

Taeyun Kim, Alvin Jinsung Choi, Dasol Hong et al.

Zero-shot object-goal navigation (ZSON) is a challenging problem in robotics that requires a comprehensive understanding of both language and visual observations. Contextual cues from rooms and objects are critical, but their relative importance depends on the target: some objects are strongly tied to specific room types, while others are better predicted by nearby co-located objects. Existing methods overlook this distinction, leading to inefficient and inaccurate exploration. We present CLUE, a novel navigation framework that adaptively balances the use of contextual rooms and objects by leveraging commonsense knowledge extracted from an offline large language model (LLM). By estimating a target's association with room types using LLM, the agent prioritizes room cues for predictable objects and object cues for those with weak room associations. Our framework constructs a unified semantic value map that integrates both types of contextual information, adaptively weighted by the target's ambiguity to guide exploration. Combined with multi-viewpoint verification and an exploration strategy informed by contextual cues, CLUE achieves robust and efficient navigation. Extensive experiments in simulation and real-world deployments show that our method consistently outperforms state-of-the-art baselines in both success rate (SR) and success weighted by path length (SPL), demonstrating its effectiveness and practicality for real-world navigation tasks.

CVDec 27, 2023Code
Domain Generalization with Vital Phase Augmentation

Ingyun Lee, Wooju Lee, Hyun Myung

Deep neural networks have shown remarkable performance in image classification. However, their performance significantly deteriorates with corrupted input data. Domain generalization methods have been proposed to train robust models against out-of-distribution data. Data augmentation in the frequency domain is one of such approaches that enable a model to learn phase features to establish domain-invariant representations. This approach changes the amplitudes of the input data while preserving the phases. However, using fixed phases leads to susceptibility to phase fluctuations because amplitudes and phase fluctuations commonly occur in out-of-distribution. In this study, to address this problem, we introduce an approach using finite variation of the phases of input data rather than maintaining fixed phases. Based on the assumption that the degree of domain-invariant features varies for each phase, we propose a method to distinguish phases based on this degree. In addition, we propose a method called vital phase augmentation (VIPAug) that applies the variation to the phases differently according to the degree of domain-invariant features of given phases. The model depends more on the vital phases that contain more domain-invariant features for attaining robustness to amplitude and phase fluctuations. We present experimental evaluations of our proposed approach, which exhibited improved performance for both clean and corrupted data. VIPAug achieved SOTA performance on the benchmark CIFAR-10 and CIFAR-100 datasets, as well as near-SOTA performance on the ImageNet-100 and ImageNet datasets. Our code is available at https://github.com/excitedkid/vipaug.

CVFeb 1, 2025Code
MambaGlue: Fast and Robust Local Feature Matching With Mamba

Kihwan Ryoo, Hyungtae Lim, Hyun Myung

In recent years, robust matching methods using deep learning-based approaches have been actively studied and improved in computer vision tasks. However, there remains a persistent demand for both robust and fast matching techniques. To address this, we propose a novel Mamba-based local feature matching approach, called MambaGlue, where Mamba is an emerging state-of-the-art architecture rapidly gaining recognition for its superior speed in both training and inference, and promising performance compared with Transformer architectures. In particular, we propose two modules: a) MambaAttention mixer to simultaneously and selectively understand the local and global context through the Mamba-based self-attention structure and b) deep confidence score regressor, which is a multi-layer perceptron (MLP)-based architecture that evaluates a score indicating how confidently matching predictions correspond to the ground-truth correspondences. Consequently, our MambaGlue achieves a balance between robustness and efficiency in real-world applications. As verified on various public datasets, we demonstrate that our MambaGlue yields a substantial performance improvement over baseline approaches while maintaining fast inference speed. Our code will be available on https://github.com/url-kaist/MambaGlue

ROFeb 9
Chamelion: Reliable Change Detection for Long-Term LiDAR Mapping in Transient Environments

Seoyeon Jang, Alex Junho Lee, I Made Aswin Nahrendra et al.

Online change detection is crucial for mobile robots to efficiently navigate through dynamic environments. Detecting changes in transient settings, such as active construction sites or frequently reconfigured indoor spaces, is particularly challenging due to frequent occlusions and spatiotemporal variations. Existing approaches often struggle to detect changes and fail to update the map across different observations. To address these limitations, we propose a dual-head network designed for online change detection and long-term map maintenance. A key difficulty in this task is the collection and alignment of real-world data, as manually registering structural differences over time is both labor-intensive and often impractical. To overcome this, we develop a data augmentation strategy that synthesizes structural changes by importing elements from different scenes, enabling effective model training without the need for extensive ground-truth annotations. Experiments conducted at real-world construction sites and in indoor office environments demonstrate that our approach generalizes well across diverse scenarios, achieving efficient and accurate map updates.\resubmit{Our source code and additional material are available at: https://chamelion-pages.github.io/.

45.8ROMar 17
DreamFlow: Local Navigation Beyond Observation via Conditional Flow Matching in the Latent Space

Jiwon Park, Dongkyu Lee, I Made Aswin Nahrendra et al.

Local navigation in cluttered environments often suffers from dense obstacles and frequent local minima. Conventional local planners rely on heuristics and are prone to failure, while deep reinforcement learning(DRL)based approaches provide adaptability but are constrained by limited onboard sensing. These limitations lead to navigation failures because the robot cannot perceive structures outside its field of view. In this paper, we propose DreamFlow, a DRL-based local navigation framework that extends the robot's perceptual horizon through conditional flow matching(CFM). The proposed CFM based prediction module learns probabilistic mapping between local height map latent representation and broader spatial representation conditioned on navigation context. This enables the navigation policy to predict unobserved environmental features and proactively avoid potential local minima. Experimental results demonstrate that DreamFlow outperforms existing methods in terms of latent prediction accuracy and navigation performance in simulation. The proposed method was further validated in cluttered real world environments with a quadrupedal robot. The project page is available at https://dreamflow-icra.github.io.

CVJun 9, 2025Code
CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization

Dasol Hong, Wooju Lee, Hyun Myung

Prompt tuning, which adapts vision-language models by freezing model parameters and optimizing only the prompt, has proven effective for task-specific adaptations. The core challenge in prompt tuning is improving specialization for a specific task and generalization for unseen domains. However, frozen encoders often produce misaligned features, leading to confusion between classes and limiting specialization. To overcome this issue, we propose a confusion-aware loss (CoA-loss) that improves specialization by refining the decision boundaries between confusing classes. Additionally, we mathematically demonstrate that a mixture model can enhance generalization without compromising specialization. This is achieved using confidence-aware weights (CoA-weights), which adjust the weights of each prediction in the mixture model based on its confidence within the class domains. Extensive experiments show that CoCoA-Mix, a mixture model with CoA-loss and CoA-weights, outperforms state-of-the-art methods by enhancing specialization and generalization. Our code is publicly available at https://github.com/url-kaist/CoCoA-Mix.

ROAug 12, 2021Code
Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim, Minho Oh, Hyun Myung

Ground segmentation is crucial for terrestrial mobile platforms to perform navigation or neighboring object recognition. Unfortunately, the ground is not flat, as it features steep slopes; bumpy roads; or objects, such as curbs, flower beds, and so forth. To tackle the problem, this paper presents a novel ground segmentation method called \textit{Patchwork}, which is robust for addressing the under-segmentation problem and operates at more than 40 Hz. In this paper, a point cloud is encoded into a Concentric Zone Model-based representation to assign an appropriate density of cloud points among bins in a way that is not computationally complex. This is followed by Region-wise Ground Plane Fitting, which is performed to estimate the partial ground for each bin. Finally, Ground Likelihood Estimation is introduced to dramatically reduce false positives. As experimentally verified on SemanticKITTI and rough terrain datasets, our proposed method yields promising performance compared with the state-of-the-art methods, showing faster speed compared with existing plane fitting--based methods. Code is available: https://github.com/LimHyungTae/patchwork

15.1ROMar 17
SE(3)-LIO: Smooth IMU Propagation With Jointly Distributed Poses on SE(3) Manifold for Accurate and Robust LiDAR-Inertial Odometry

Gunhee Shin, Seungjae Lee, Jei Kong et al.

In estimating odometry accurately, an inertial measurement unit (IMU) is widely used owing to its high-rate measurements, which can be utilized to obtain motion information through IMU propagation. In this paper, we address the limitations of existing IMU propagation methods in terms of motion prediction and motion compensation. In motion prediction, the existing methods typically represent a 6-DoF pose by separating rotation and translation and propagate them on their respective manifold, so that the rotational variation is not effectively incorporated into translation propagation. During motion compensation, the relative transformation between predicted poses is used to compensate motion-induced distortion in other measurements, while inherent errors in the predicted poses introduce uncertainty in the relative transformation. To tackle these challenges, we represent and propagate the pose on SE(3) manifold, where propagated translation properly accounts for rotational variation. Furthermore, we precisely characterize the relative transformation uncertainty by considering the correlation between predicted poses, and incorporate this uncertainty into the measurement noise during motion compensation. To this end, we propose a LiDAR-inertial odometry (LIO), referred to as SE(3)-LIO, that integrates the proposed IMU propagation and uncertainty-aware motion compensation (UAMC). We validate the effectiveness of SE(3)-LIO on diverse datasets. Our source code and additional material are available at: https://se3-lio.github.io/.

30.0ROMay 9
Raymoval: Raycasting-based Dynamic Object Removal for Static 3D Mapping

Daebeom Kim, Seungjae Lee, Seoyeon Jang et al.

Static mapping is fundamental to robot navigation, providing a persistent geometric prior and a consistent reference for long-term autonomy. However, dynamic objects leave residual traces and cause surface loss, which reduces map consistency. We propose a raycasting-based module for dynamic object removal in static 3D mapping. Each scan is projected onto an azimuth-elevation grid, and for every viewing direction we compare the bin-wise minimum range with the map's first-hit distance computed by raycasting. Furthermore, we apply a raycast consistency test that separates dynamic from static points. Finally, a spatial consistency validation step refines labels, producing static maps with lower residual dynamics and reduced over-removal. We evaluate our approach quantitatively and qualitatively on SemanticKITTI and a challenging custom dataset, and show consistent static mapping results.

ROMar 6
OpenHEART: Opening Heterogeneous Articulated Objects with a Legged Manipulator

Seonghyeon Lim, Hyeonwoo Lee, Seunghyun Lee et al.

Legged manipulators offer high mobility and versatile manipulation. However, robust interaction with heterogeneous articulated objects, such as doors, drawers, and cabinets, remains challenging because of the diverse articulation types of the objects and the complex dynamics of the legged robot. Existing reinforcement learning (RL)-based approaches often rely on high-dimensional sensory inputs, leading to sample inefficiency. In this paper, we propose a robust and sample-efficient framework for opening heterogeneous articulated objects with a legged manipulator. In particular, we propose Sampling-based Abstracted Feature Extraction (SAFE), which encodes handle and panel geometry into a compact low-dimensional representation, improving cross-domain generalization. Additionally, Articulation Information Estimator (ArtIEst) is introduced to adaptively mix proprioception with exteroception to estimate opening direction and range of motion for each object. The proposed framework was deployed to manipulate various heterogeneous articulated objects in simulation and real-world robot systems. Videos can be found on the project website: https://openheart-icra.github.io/OpenHEART/

24.3CVMar 16
E2EGS: Event-to-Edge Gaussian Splatting for Pose-Free 3D Reconstruction

Yunsoo Kim, Changki Sung, Dasol Hong et al.

The emergence of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS) has advanced novel view synthesis (NVS). These methods, however, require high-quality RGB inputs and accurate corresponding poses, limiting robustness under real-world conditions such as fast camera motion or adverse lighting. Event cameras, which capture brightness changes at each pixel with high temporal resolution and wide dynamic range, enable precise sensing of dynamic scenes and offer a promising solution. However, existing event-based NVS methods either assume known poses or rely on depth estimation models that are bounded by their initial observations, failing to generalize as the camera traverses previously unseen regions. We present E2EGS, a pose-free framework operating solely on event streams. Our key insight is that edge information provides rich structural cues essential for accurate trajectory estimation and high-quality NVS. To extract edges from noisy event streams, we exploit the distinct spatio-temporal characteristics of edges and non-edge regions. The event camera's movement induces consistent events along edges, while non-edge regions produce sparse noise. We leverage this through a patch-based temporal coherence analysis that measures local variance to extract edges while robustly suppressing noise. The extracted edges guide structure-aware Gaussian initialization and enable edge-weighted losses throughout initialization, tracking, and bundle adjustment. Extensive experiments on both synthetic and real datasets demonstrate that E2EGS achieves superior reconstruction quality and trajectory accuracy, establishing a fully pose-free paradigm for event-based 3D reconstruction.

CVApr 16, 2024
Contextrast: Contextual Contrastive Learning for Semantic Segmentation

Changki Sung, Wanhee Kim, Jungho An et al.

Despite great improvements in semantic segmentation, challenges persist because of the lack of local/global contexts and the relationship between them. In this paper, we propose Contextrast, a contrastive learning-based semantic segmentation method that allows to capture local/global contexts and comprehend their relationships. Our proposed method comprises two parts: a) contextual contrastive learning (CCL) and b) boundary-aware negative (BANE) sampling. Contextual contrastive learning obtains local/global context from multi-scale feature aggregation and inter/intra-relationship of features for better discrimination capabilities. Meanwhile, BANE sampling selects embedding features along the boundaries of incorrectly predicted regions to employ them as harder negative samples on our contrastive learning, resolving segmentation issues along the boundary region by exploiting fine-grained details. We demonstrate that our Contextrast substantially enhances the performance of semantic segmentation networks, outperforming state-of-the-art contrastive learning approaches on diverse public datasets, e.g. Cityscapes, CamVid, PASCAL-C, COCO-Stuff, and ADE20K, without an increase in computational cost during inference.

ROOct 20, 2024
DynaVINS++: Robust Visual-Inertial State Estimator in Dynamic Environments by Adaptive Truncated Least Squares and Stable State Recovery

Seungwon Song, Hyungtae Lim, Alex Junho Lee et al.

Despite extensive research in robust visual-inertial navigation systems~(VINS) in dynamic environments, many approaches remain vulnerable to objects that suddenly start moving, which are referred to as \textit{abruptly dynamic objects}. In addition, most approaches have considered the effect of dynamic objects only at the feature association level. In this study, we observed that the state estimation diverges when errors from false correspondences owing to moving objects incorrectly propagate into the IMU bias terms. To overcome these problems, we propose a robust VINS framework called \mbox{\textit{DynaVINS++}}, which employs a) adaptive truncated least square method that adaptively adjusts the truncation range using both feature association and IMU preintegration to effectively minimize the effect of the dynamic objects while reducing the computational cost, and b)~stable state recovery with bias consistency check to correct misestimated IMU bias and to prevent the divergence caused by abruptly dynamic objects. As verified in both public and real-world datasets, our approach shows promising performance in dynamic environments, including scenes with abruptly dynamic objects.

CVMar 4, 2025
PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers

Wooju Lee, Juhye Park, Dasol Hong et al.

Accurate localization is essential for autonomous driving, but GNSS-based methods struggle in challenging environments such as urban canyons. Cross-view pose optimization offers an effective solution by directly estimating vehicle pose using satellite-view images. However, existing methods primarily rely on cross-view features at a given pose, neglecting fine-grained contexts for precision and global contexts for robustness against large initial pose errors. To overcome these limitations, we propose PIDLoc, a novel cross-view pose optimization approach inspired by the proportional-integral-derivative (PID) controller. Using RGB images and LiDAR, the PIDLoc comprises the PID branches to model cross-view feature relationships and the spatially aware pose estimator (SPE) to estimate the pose from these relationships. The PID branches leverage feature differences for local context (P), aggregated feature differences for global context (I), and gradients of feature differences for precise pose adjustment (D) to enhance localization accuracy under large initial pose errors. Integrated with the PID branches, the SPE captures spatial relationships within the PID-branch features for consistent localization. Experimental results demonstrate that the PIDLoc achieves state-of-the-art performance in cross-view pose estimation for the KITTI dataset, reducing position error by $37.8\%$ compared with the previous state-of-the-art.

16.1CVMar 13
VIRD: View-Invariant Representation through Dual-Axis Transformation for Cross-View Pose Estimation

Juhye Park, Wooju Lee, Dasol Hong et al.

Accurate global localization is crucial for autonomous driving and robotics, but GNSS-based approaches often degrade due to occlusion and multipath effects. As an emerging alternative, cross-view pose estimation predicts the 3-DoF camera pose corresponding to a ground-view image with respect to a geo-referenced satellite image. However, existing methods struggle to bridge the significant viewpoint gap between the ground and satellite views mainly due to limited spatial correspondences. We propose a novel cross-view pose estimation method that constructs view-invariant representations through dual-axis transformation (VIRD). VIRD first applies a polar transformation to the satellite view to establish horizontal correspondence, then uses context-enhanced positional attention on the ground and polar-transformed satellite features to resolve vertical misalignment, explicitly mitigating the viewpoint gap. A view-reconstruction loss is introduced to strengthen the view invariance further, encouraging the derived representations to reconstruct the original and cross-view images. Experiments on the KITTI and VIGOR datasets demonstrate that VIRD outperforms the state-of-the-art methods without orientation priors, reducing median position and orientation errors by 50.7% and 76.5% on KITTI, and 18.0% and 46.8% on VIGOR, respectively.

ROJun 14, 2024
Galibr: Targetless LiDAR-Camera Extrinsic Calibration Method via Ground Plane Initialization

Wonho Song, Minho Oh, Jaeyoung Lee et al.

With the rapid development of autonomous driving and SLAM technology, the performance of autonomous systems using multimodal sensors highly relies on accurate extrinsic calibration. Addressing the need for a convenient, maintenance-friendly calibration process in any natural environment, this paper introduces Galibr, a fully automatic targetless LiDAR-camera extrinsic calibration tool designed for ground vehicle platforms in any natural setting. The method utilizes the ground planes and edge information from both LiDAR and camera inputs, streamlining the calibration process. It encompasses two main steps: an initial pose estimation algorithm based on ground planes (GP-init), and a refinement phase through edge extraction and matching. Our approach significantly enhances calibration performance, primarily attributed to our novel initial pose estimation method, as demonstrated in unstructured natural environments, including on the KITTI dataset and the KAIST quadruped dataset.

ROFeb 11, 2022
STEP: State Estimator for Legged Robots Using a Preintegrated foot Velocity Factor

Yeeun Kim, Byeongho Yu, Eungchang Mason Lee et al.

We propose a novel state estimator for legged robots, STEP, achieved through a novel preintegrated foot velocity factor. In the preintegrated foot velocity factor, the usual non-slip assumption is not adopted. Instead, the end effector velocity becomes observable by exploiting the body speed obtained from a stereo camera. In other words, the preintegrated end effector's pose can be estimated. Another advantage of our approach is that it eliminates the necessity for a contact detection step, unlike the typical approaches. The proposed method has also been validated in harsh-environment simulations and real-world experiments containing uneven or slippery terrains.

CVDec 27, 2021
Adversarial Attack for Asynchronous Event-based Data

Wooju Lee, Hyun Myung

Deep neural networks (DNNs) are vulnerable to adversarial examples that are carefully designed to cause the deep learning model to make mistakes. Adversarial examples of 2D images and 3D point clouds have been extensively studied, but studies on event-based data are limited. Event-based data can be an alternative to a 2D image under high-speed movements, such as autonomous driving. However, the given adversarial events make the current deep learning model vulnerable to safety issues. In this work, we generate adversarial examples and then train the robust models for event-based data, for the first time. Our algorithm shifts the time of the original events and generates additional adversarial events. Additional adversarial events are generated in two stages. First, null events are added to the event-based data to generate additional adversarial events. The perturbation size can be controlled with the number of null events. Second, the location and time of additional adversarial events are set to mislead DNNs in a gradient-based attack. Our algorithm achieves an attack success rate of 97.95\% on the N-Caltech101 dataset. Furthermore, the adversarial training model improves robustness on the adversarial event data compared to the original model.

RODec 27, 2021
UV-SLAM: Unconstrained Line-based SLAM Using Vanishing Points for Structural Mapping

Hyunjun Lim, Jinwoo Jeon, Hyun Myung

In feature-based simultaneous localization and mapping (SLAM), line features complement the sparsity of point features, making it possible to map the surrounding environment structure. Existing approaches utilizing line features have primarily employed a measurement model that uses line re-projection. However, the direction vectors used in the 3D line mapping process cannot be corrected because the line measurement model employs only the lines' normal vectors in the Plücker coordinate. As a result, problems like degeneracy that occur during the 3D line mapping process cannot be solved. To tackle the problem, this paper presents a UV-SLAM, which is an unconstrained line-based SLAM using vanishing points for structural mapping. This paper focuses on using structural regularities without any constraints, such as the Manhattan world assumption. For this, we use the vanishing points that can be obtained from the line features. The difference between the vanishing point observation calculated through line features in the image and the vanishing point estimation calculated through the direction vector is defined as a residual and added to the cost function of optimization-based SLAM. Furthermore, through Fisher information matrix rank analysis, we prove that vanishing point measurements guarantee a unique mapping solution. Finally, we demonstrate that the localization accuracy and mapping quality are improved compared to the state-of-the-art algorithms using public datasets.

CVDec 3, 2021
Gesture Recognition with a Skeleton-Based Keyframe Selection Module

Yunsoo Kim, Hyun Myung

We propose a bidirectional consecutively connected two-pathway network (BCCN) for efficient gesture recognition. The BCCN consists of two pathways: (i) a keyframe pathway and (ii) a temporal-attention pathway. The keyframe pathway is configured using the skeleton-based keyframe selection module. Keyframes pass through the pathway to extract the spatial feature of itself, and the temporal-attention pathway extracts temporal semantics. Our model improved gesture recognition performance in videos and obtained better activation maps for spatial and temporal properties. Tests were performed on the Chalearn dataset, the ETRI-Activity 3D dataset, and the Toyota Smart Home dataset.

RONov 30, 2021
WALK-VIO: Walking-motion-Adaptive Leg Kinematic Constraint Visual-Inertial Odometry for Quadruped Robots

Hyunjun Lim, Byeongho Yu, Yeeun Kim et al.

In this paper, WALK-VIO, a novel visual-inertial odometry (VIO) with walking-motion-adaptive leg kinematic constraints that change with body motion for localization of quadruped robots, is proposed. Quadruped robots primarily use VIO because they require fast localization for control and path planning. However, since quadruped robots are mainly used outdoors, extraneous features extracted from the sky or ground cause tracking failures. In addition, the quadruped robots' walking motion cause wobbling, which lowers the localization accuracy due to the camera and inertial measurement unit (IMU). To overcome these limitations, many researchers use VIO with leg kinematic constraints. However, since the quadruped robot's walking motion varies according to the controller, gait, quadruped robots' velocity, and so on, these factors should be considered in the process of adding leg kinematic constraints. We propose VIO that can be used regardless of walking motion by adjusting the leg kinematic constraint factor. In order to evaluate WALK-VIO, we create and publish datasets of quadruped robots that move with various types of walking motion in a simulation environment. In addition, we verified the validity of WALK-VIO through comparison with current state-of-the-art algorithms.

ROSep 2, 2021
MIR-VIO: Mutual Information Residual-based Visual Inertial Odometry with UWB Fusion for Robust Localization

Sungjae Shin, Eungchang Lee, Junho Choi et al.

For many years, there has been an impressive progress on visual odometry applied to mobile robots and drones. However, the visual perception is still in the spotlight as a challenging field because the vision sensor has some problems in obtaining correct scale information with a monocular camera and also is vulnerable to a situation in which illumination is changed. In this paper, UWB sensor fusion is proposed in the visual inertial odometry algorithm as a solution to mitigate this problem. We designed a cost function based on mutual information considering the UWB. Considering the characteristic of the UWB signal model, where the uncertainty increases as the distance between the UWB anchor and the tag increases, we introduced a new residual term to the cost function. When the experiment was conducted in an indoor environment with the above methodology, the initialization problem in an environment with few feature points was solved through the UWB sensor fusion, and localization became robust. And when the residual term using the concept of mutual information was used, the most robust odometry could be obtained.

ROAug 15, 2021
A Morphing Quadrotor that Can Optimize Morphology for Transportation

Chanyoung Kim, Hyungyu Lee, Myeongwoo Jeong et al.

Multirotors can be effectively applied to various tasks, such as transportation, investigation, exploration, and lifesaving, depending on the type of payload. However, due to the nature of multirotors, the payload loaded on the multirotor is limited in its position and weight, which presents a major disadvantage when the multirotor is used in various fields. In this paper, we propose a novel method that greatly improves the restrictions on payload position and weight using a morphing quadrotor system. Our method can estimate the drone's weight, center of gravity position, and inertia tensor in real-time, which change depending on payload, and determine the optimal morphology for efficient and stable flight. An adaptive control method that can reflect the change in flight dynamics by payload and morphing is also presented. Experiments were conducted to confirm that the proposed morphing quadrotor improves the stability and efficiency in various situations of transporting payloads compared with the conventional quadrotor systems.

ROAug 11, 2021
Low-level Pose Control of Tilting Multirotor for Wall Perching Tasks Using Reinforcement Learning

Hyungyu Lee, Myeongwoo Jeong, Chanyoung Kim et al.

Recently, needs for unmanned aerial vehicles (UAVs) that are attachable to the wall have been highlighted. As one of the ways to address the need, researches on various tilting multirotors that can increase maneuverability has been employed. Unfortunately, existing studies on the tilting multirotors require considerable amounts of prior information on the complex dynamic model. Meanwhile, reinforcement learning on quadrotors has been studied to mitigate this issue. Yet, these are only been applied to standard quadrotors, whose systems are less complex than those of tilting multirotors. In this paper, a novel reinforcement learning-based method is proposed to control a tilting multirotor on real-world applications, which is the first attempt to apply reinforcement learning to a tilting multirotor. To do so, we propose a novel reward function for a neural network model that takes power efficiency into account. The model is initially trained over a simulated environment and then fine-tuned using real-world data in order to overcome the sim-to-real gap issue. Furthermore, a novel, efficient state representation with respect to the goal frame that helps the network learn optimal policy better is proposed. As verified on real-world experiments, our proposed method shows robust controllability by overcoming the complex dynamics of tilting multirotors.

ROAug 5, 2021
REAL: Rapid Exploration with Active Loop-Closing toward Large-Scale 3D Mapping using UAVs

Eungchang Mason Lee, Junho Choi, Hyungtae Lim et al.

Exploring an unknown environment without colliding with obstacles is one of the essentials of autonomous vehicles to perform diverse missions such as structural inspections, rescues, deliveries, and so forth. Therefore, unmanned aerial vehicles (UAVs), which are fast, agile, and have high degrees of freedom, have been widely used. However, previous approaches have two limitations: a) First, they may not be appropriate for exploring large-scale environments because they mainly depend on random sampling-based path planning that causes unnecessary movements. b) Second, they assume the pose estimation is accurate enough, which is the most critical factor in obtaining an accurate map. In this paper, to explore and map unknown large-scale environments rapidly and accurately, we propose a novel exploration method that combines the pre-calculated Peacock Trajectory with graph-based global exploration and active loop-closing. Because the two-step trajectory that considers the kinodynamics of UAVs is used, obstacle avoidance is guaranteed in the receding-horizon manner. In addition, local exploration that considers the frontier and global exploration based on the graph maximizes the speed of exploration by minimizing unnecessary revisiting. In addition, by actively closing the loop based on the likelihood, pose estimation performance is improved. The proposed method's performance is verified by exploring 3D simulation environments in comparison with the state-of-the-art methods. Finally, the proposed approach is validated in a real-world experiment.

CVJun 18, 2021
Equivariance-bridged SO(2)-Invariant Representation Learning using Graph Convolutional Network

Sungwon Hwang, Hyungtae Lim, Hyun Myung

Training a Convolutional Neural Network (CNN) to be robust against rotation has mostly been done with data augmentation. In this paper, another progressive vision of research direction is highlighted to encourage less dependence on data augmentation by achieving structural rotational invariance of a network. The deep equivariance-bridged SO(2) invariant network is proposed to echo such vision. First, Self-Weighted Nearest Neighbors Graph Convolutional Network (SWN-GCN) is proposed to implement Graph Convolutional Network (GCN) on the graph representation of an image to acquire rotationally equivariant representation, as GCN is more suitable for constructing deeper network than spectral graph convolution-based approaches. Then, invariant representation is eventually obtained with Global Average Pooling (GAP), a permutation-invariant operation suitable for aggregating high-dimensional representations, over the equivariant set of vertices retrieved from SWN-GCN. Our method achieves the state-of-the-art image classification performance on rotated MNIST and CIFAR-10 images, where the models are trained with a non-augmented dataset only. Quantitative validations over invariance of the representations also demonstrate strong invariance of deep representations of SWN-GCN over rotations.

CVMar 7, 2021
ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point Cloud Map Building

Hyungtae Lim, Sungwon Hwang, Hyun Myung

Scan data of urban environments often include representations of dynamic objects, such as vehicles, pedestrians, and so forth. However, when it comes to constructing a 3D point cloud map with sequential accumulations of the scan data, the dynamic objects often leave unwanted traces in the map. These traces of dynamic objects act as obstacles and thus impede mobile vehicles from achieving good localization and navigation performances. To tackle the problem, this paper presents a novel static map building method called ERASOR, Egocentric RAtio of pSeudo Occupancy-based dynamic object Removal, which is fast and robust to motion ambiguity. Our approach directs its attention to the nature of most dynamic objects in urban environments being inevitably in contact with the ground. Accordingly, we propose the novel concept called pseudo occupancy to express the occupancy of unit space and then discriminate spaces of varying occupancy. Finally, Region-wise Ground Plane Fitting (R-GPF) is adopted to distinguish static points from dynamic points within the candidate bins that potentially contain dynamic points. As experimentally verified on SemanticKITTI, our proposed method yields promising performance against state-of-the-art methods overcoming the limitations of existing ray tracing-based and visibility-based methods.

ROMar 2, 2021
Run Your Visual-Inertial Odometry on NVIDIA Jetson: Benchmark Tests on a Micro Aerial Vehicle

Jinwoo Jeon, Sungwook Jung, Eungchang Lee et al.

This paper presents benchmark tests of various visual(-inertial) odometry algorithms on NVIDIA Jetson platforms. The compared algorithms include mono and stereo, covering Visual Odometry (VO) and Visual-Inertial Odometry (VIO): VINS-Mono, VINS-Fusion, Kimera, ALVIO, Stereo-MSCKF, ORB-SLAM2 stereo, and ROVIO. As these methods are mainly used for unmanned aerial vehicles (UAVs), they must perform well in situations where the size of the processing board and weight is limited. Jetson boards released by NVIDIA satisfy these constraints as they have a sufficiently powerful central processing unit (CPU) and graphics processing unit (GPU) for image processing. However, in existing studies, the performance of Jetson boards as a processing platform for executing VO/VIO has not been compared extensively in terms of the usage of computing resources and accuracy. Therefore, this study compares representative VO/VIO algorithms on several NVIDIA Jetson platforms, namely NVIDIA Jetson TX2, Xavier NX, and AGX Xavier, and introduces a novel dataset 'KAIST VIO dataset' for UAVs. Including pure rotations, the dataset has several geometric trajectories that are harsh to visual(-inertial) state estimation. The evaluation is performed in terms of the accuracy of estimated odometry, CPU usage, and memory usage on various Jetson boards, algorithms, and trajectories. We present the {results of the} comprehensive benchmark test and release the dataset for the computer vision and robotics applications.

ROMar 2, 2021
Avoiding Degeneracy for Monocular Visual SLAM with Point and Line Features

Hyunjun Lim, Yeeun Kim, Kwangik Jung et al.

In this paper, a degeneracy avoidance method for a point and line based visual SLAM algorithm is proposed. Visual SLAM predominantly uses point features. However, point features lack robustness in low texture and illuminance variant environments. Therefore, line features are used to compensate the weaknesses of point features. In addition, point features are poor in representing discernable features for the naked eye, meaning mapped point features cannot be recognized. To overcome the limitations above, line features were actively employed in previous studies. However, since degeneracy arises in the process of using line features, this paper attempts to solve this problem. First, a simple method to identify degenerate lines is presented. In addition, a novel structural constraint is proposed to avoid the degeneracy problem. At last, a point and line based monocular SLAM system using a robust optical-flow based lien tracking method is implemented. The results are verified using experiments with the EuRoC dataset and compared with other state-of-the-art algorithms. It is proven that our method yields more accurate localization as well as mapping results.

RODec 30, 2020
ALVIO: Adaptive Line and Point Feature-based Visual Inertial Odometry for Robust Localization in Indoor Environments

KwangYik Jung, YeEun Kim, HyunJun Lim et al.

The amount of texture can be rich or deficient depending on the objects and the structures of the building. The conventional mono visual-initial navigation system (VINS)-based localization techniques perform well in environments where stable features are guaranteed. However, their performance is not assured in a changing indoor environment. As a solution to this, we propose Adaptive Line and point feature-based Visual Inertial Odometry (ALVIO) in this paper. ALVIO actively exploits the geometrical information of lines that exist in abundance in an indoor space. By using a strong line tracker and adaptive selection of feature-based tightly coupled optimization, it is possible to perform robust localization in a variable texture environment. The structural characteristics of ALVIO are as follows: First, the proposed optical flow-based line tracker performs robust line feature tracking and management. By using epipolar geometry and trigonometry, accurate 3D lines are recovered. These 3D lines are used to calculate the line re-projection error. Finally, with the sensitivity-analysis-based adaptive feature selection in the optimization process, we can estimate the pose robustly in various indoor environments. We validate the performance of our system on public datasets and compare it against other state-of the-art algorithms (S-MSKCF, VINS-Mono). In the proposed algorithm based on point and line feature selection, translation RMSE increased by 16.06% compared to VINS-Mono, while total optimization time decreased by up to 49.31%. Through this, we proved that it is a useful algorithm as a real-time pose estimation algorithm.

RODec 29, 2020
Peacock Exploration: A Lightweight Exploration for UAV using Control-Efficient Trajectory

EungChang Mason Lee, Duckyu Choi, Hyun Myung

Unmanned Aerial Vehicles have received much attention in recent years due to its wide range of applications, such as exploration of an unknown environment to acquire a 3D map without prior knowledge of it. Existing exploration methods have been largely challenged by computationally heavy probabilistic path planning. Similarly, kinodynamic constraints or proper sensors considering the payload for UAVs were not considered. In this paper, to solve those issues and to consider the limited payload and computational resource of UAVs, we propose "Peacock Exploration": A lightweight exploration method for UAVs using precomputed minimum snap trajectories which look like a peacock's tail. Using the widely known, control efficient minimum snap trajectories and OctoMap, the UAV equipped with a RGB-D camera can explore unknown 3D environments without any prior knowledge or human-guidance with only O(logN) computational complexity. It also adopts the receding horizon approach and simple, heuristic scoring criteria. The proposed algorithm's performance is demonstrated by exploring a challenging 3D maze environment and compared with a state-of-the-art algorithm.

CVAug 4, 2020
MSDPN: Monocular Depth Prediction with Partial Laser Observation using Multi-stage Neural Networks

Hyungtae Lim, Hyeonjae Gil, Hyun Myung

In this study, a deep-learning-based multi-stage network architecture called Multi-Stage Depth Prediction Network (MSDPN) is proposed to predict a dense depth map using a 2D LiDAR and a monocular camera. Our proposed network consists of a multi-stage encoder-decoder architecture and Cross Stage Feature Aggregation (CSFA). The proposed multi-stage encoder-decoder architecture alleviates the partial observation problem caused by the characteristics of a 2D LiDAR, and CSFA prevents the multi-stage network from diluting the features and allows the network to learn the inter-spatial relationship between features better. Previous works use sub-sampled data from the ground truth as an input rather than actual 2D LiDAR data. In contrast, our approach trains the model and conducts experiments with a physically-collected 2D LiDAR dataset. To this end, we acquired our own dataset called KAIST RGBD-scan dataset and validated the effectiveness and the robustness of MSDPN under realistic conditions. As verified experimentally, our network yields promising performance against state-of-the-art methods. Additionally, we analyzed the performance of different input methods and confirmed that the reference depth map is robust in untrained scenarios.

ROAug 4, 2020
Development and Analysis of Digging and Soil Removing Mechanisms for Mole-Bot: Bio-Inspired Mole-Like Drilling Robot

Junseok Lee, Christian Tirtawardhana, Hyun Myung

Interests in exploration of new energy resources are increasing due to the exhaustion of existing resources. To explore new energy sources, various studies have been conducted to improve the drilling performance of drilling equipment for deep and strong ground. However, with better performance, the modern drilling equipment is bulky and, furthermore, has become inconvenient in both installation and operation, for it takes complex procedures for complex terrains. Moreover, environmental issues are also a concern because of the excessive use of mud and slurry to remove excavated soil. To overcome these limitations, a mechanism that combines an expandable drill bit and link structure to simulate the function of the teeth and forelimbs of a mole is proposed. In this paper, the proposed expandable drill bit simplifies the complexity and high number of degrees of freedom of the animal head. In addition, a debris removal mechanism mimicking a shoulder structure and forefoot movement is proposed. For efficient debris removal, the proposed mechanism enables the simultaneous rotation and expanding/folding motions of the drill bit by using a single actuator. The performance of the proposed system is evaluated by dynamic simulations and experiments.

ROAug 4, 2020
BRM Localization: UAV Localization in GNSS-Denied Environments Based on Matching of Numerical Map and UAV Images

Junho Choi, Hyun Myung

Localization is one of the most important technologies needed to use Unmanned Aerial Vehicles (UAVs) in actual fields. Currently, most UAVs use GNSS to estimate their position. Recently, there have been attacks that target the weaknesses of UAVs that use GNSS, such as interrupting GNSS signal to crash the UAVs or sending fake GNSS signals to hijack the UAVs. To avoid this kind of situation, this paper proposes an algorithm that deals with the localization problem of the UAV in GNSS-denied environments. We propose a localization method, named as BRM (Building Ratio Map based) localization, for a UAV by matching an existing numerical map with UAV images. The building area is extracted from the UAV images. The ratio of buildings that occupy in the corresponding image frame is calculated and matched with the building information on the numerical map. The position estimation is started in the range of several km^2 area, so that the position estimation can be performed without knowing the exact initial coordinate. Only freely available maps are used for training data set and matching the ground truth. Finally, we get real UAV images, IMU data, and GNSS data from UAV flight to show that the proposed method can achieve better performance than the conventional methods.