Dongjae Lee

RO
h-index21
16papers
227citations
Novelty47%
AI Score56

16 Papers

ROMay 15Code
LAPS: Improving Incremental LiDAR Mapping using Active Pooling and Sampling for Neural Distance Fields

Dongjae Lee, Wooseong Yang, Yifu Tao et al.

Neural distance fields offer a compact and continuous representation of 3D geometry, making them attractive for incremental LiDAR mapping. However, their online optimization is vulnerable to catastrophic forgetting, where new observations can degrade previously reconstructed geometry. Replay-based training is commonly used to address this issue, but existing methods typically rely on passive replay buffers and uniform sampling, which can waste memory on redundant observations and under-train poorly constrained regions. We propose LAPS, a replay management framework for incremental neural mapping that improves both replay retention and replay allocation during online updates. LAPS combines reliability-based active pooling to retain reliable historical samples under limited memory with uncertainty-guided active sampling to focus optimization on under-constrained regions. Experiments on synthetic and real-world benchmarks show that LAPS consistently improves reconstruction completeness while maintaining competitive geometric accuracy. On Oxford Spires, it improves recall by 4.66 pp and F1-score by 3.79 pp over PIN-SLAM on the Blenheim Palace 05 sequence. We release our open source implementation at: https://github.com/dongjae0107/LAPS.

ROMar 13
UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies

Harsh Gupta, Xiaofeng Guo, Huy Ha et al.

We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, checkpoints, and result videos can be found at umi-on-air.github.io.

ROApr 3
Geometrically-Constrained Radar-Inertial Odometry via Continuous Point-Pose Uncertainty Modeling

Wooseong Yang, Dongjae Lee, Minwoo Jung et al.

Radar odometry is crucial for robust localization in challenging environments; however, the sparsity of reliable returns and distinctive noise characteristics impede its performance. This paper introduces geometrically-constrained radar-inertial odometry and mapping that jointly consolidates point and pose uncertainty. We employ the continuous trajectory model to estimate the pose uncertainty at any arbitrary timestamp by propagating uncertainties of the control points. These pose uncertainties are continuously integrated with heteroscedastic measurement uncertainty during point projection, thereby enabling dynamic evaluation of observation confidence and adaptive down-weighting of uninformative radar points. By leveraging quantified uncertainties in radar mapping, we construct a high-fidelity map that improves odometry accuracy under imprecise radar measurements. Moreover, we reveal the effectiveness of explicit geometrical constraints in radar-inertial odometry when incorporated with the proposed uncertainty-aware mapping framework. Extensive experiments on diverse real-world datasets demonstrate the superiority of our method, yielding substantial performance improvements in both accuracy and efficiency compared to existing baselines.

CVJun 20, 2025Code
Camera Calibration via Circular Patterns: A Comprehensive Framework with Measurement Uncertainty and Unbiased Projection Model

Chaehyeon Song, Dongjae Lee, Jongwoo Lim et al.

Camera calibration using planar targets has been widely favored, and two types of control points have been mainly considered as measurements: the corners of the checkerboard and the centroid of circles. Since a centroid is derived from numerous pixels, the circular pattern provides more precise measurements than the checkerboard. However, the existing projection model of circle centroids is biased under lens distortion, resulting in low performance. To surmount this limitation, we propose an unbiased projection model of the circular pattern and demonstrate its superior accuracy compared to the checkerboard. Complementing this, we introduce uncertainty into circular patterns to enhance calibration robustness and completeness. Defining centroid uncertainty improves the performance of calibration components, including pattern detection, optimization, and evaluation metrics. We also provide guidelines for performing good camera calibration based on the evaluation metric. The core concept of this approach is to model the boundary points of a two-dimensional shape as a Markov random field, considering its connectivity. The shape distribution is propagated to the centroid uncertainty through an appropriate shape representation based on the Green theorem. Consequently, the resulting framework achieves marked gains in calibration accuracy and robustness. The complete source code and demonstration video are available at https://github.com/chaehyeonsong/discocal.

ROMay 8, 2025Code
The City that Never Settles: Simulation-based LiDAR Dataset for Long-Term Place Recognition Under Extreme Structural Changes

Hyunho Song, Dongjae Lee, Seunghun Oh et al.

Large-scale construction and demolition significantly challenge long-term place recognition (PR) by drastically reshaping urban and suburban environments. Existing datasets predominantly reflect limited or indoor-focused changes, failing to adequately represent extensive outdoor transformations. To bridge this gap, we introduce the City that Never Settles (CNS) dataset, a simulation-based dataset created using the CARLA simulator, capturing major structural changes-such as building construction and demolition-across diverse maps and sequences. Additionally, we propose TCR_sym, a symmetric version of the original TCR metric, enabling consistent measurement of structural changes irrespective of source-target ordering. Quantitative comparisons demonstrate that CNS encompasses more extensive transformations than current real-world benchmarks. Evaluations of state-of-the-art LiDAR-based PR methods on CNS reveal substantial performance degradation, underscoring the need for robust algorithms capable of handling significant environmental changes. Our dataset is available at https://github.com/Hyunho111/CNS_dataset.

CVFeb 15, 2024
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Jisu Nam, Heesu Kim, DongJae Lee et al.

The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.

CLApr 2, 2024
HyperCLOVA X Technical Report

Kang Min Yoo, Jaegeun Han, Sookyo In et al.

We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.

HCMar 24, 2025
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action Verification

Jungjae Lee, Dongjae Lee, Chihun Choi et al.

Large Foundation Models (LFMs) have unlocked new possibilities in human-computer interaction, particularly with the rise of mobile Graphical User Interface (GUI) Agents capable of interacting with mobile GUIs. These agents allow users to automate complex mobile tasks through simple natural language instructions. However, the inherent probabilistic nature of LFMs, coupled with the ambiguity and context-dependence of mobile tasks, makes LFM-based automation unreliable and prone to errors. To address this critical challenge, we introduce VeriSafe Agent (VSA): a formal verification system that serves as a logically grounded safeguard for Mobile GUI Agents. VSA deterministically ensures that an agent's actions strictly align with user intent before executing the action. At its core, VSA introduces a novel autoformalization technique that translates natural language user instructions into a formally verifiable specification. This enables runtime, rule-based verification of agent's actions, detecting erroneous actions even before they take effect. To the best of our knowledge, VSA is the first attempt to bring the rigor of formal verification to GUI agents, bridging the gap between LFM-driven actions and formal software verification. We implement VSA using off-the-shelf LFM services (GPT-4o) and evaluate its performance on 300 user instructions across 18 widely used mobile apps. The results demonstrate that VSA achieves 94.33%-98.33% accuracy in verifying agent actions, outperforming existing LFM-based verification methods by 30.00%-16.33%, and increases the GUI agent's task completion rate by 90%-130%.

SDMay 19, 2025
VocalAgent: Large Language Models for Vocal Health Diagnostics with Safety-Aware Evaluation

Yubin Kim, Taehan Kim, Wonjune Kang et al.

Vocal health plays a crucial role in peoples' lives, significantly impacting their communicative abilities and interactions. However, despite the global prevalence of voice disorders, many lack access to convenient diagnosis and treatment. This paper introduces VocalAgent, an audio large language model (LLM) to address these challenges through vocal health diagnosis. We leverage Qwen-Audio-Chat fine-tuned on three datasets collected in-situ from hospital patients, and present a multifaceted evaluation framework encompassing a safety assessment to mitigate diagnostic biases, cross-lingual performance analysis, and modality ablation studies. VocalAgent demonstrates superior accuracy on voice disorder classification compared to state-of-the-art baselines. Its LLM-based method offers a scalable solution for broader adoption of health diagnostics, while underscoring the importance of ethical and technical validation.

CVNov 21, 2025
SVRecon: Sparse Voxel Rasterization for Surface Reconstruction

Seunghun Oh, Jaesung Choe, Dongjae Lee et al.

We extend the recently proposed sparse voxel rasterization paradigm to the task of high-fidelity surface reconstruction by integrating Signed Distance Function (SDF), named SVRecon. Unlike 3D Gaussians, sparse voxels are spatially disentangled from their neighbors and have sharp boundaries, which makes them prone to local minima during optimization. Although SDF values provide a naturally smooth and continuous geometric field, preserving this smoothness across independently parameterized sparse voxels is nontrivial. To address this challenge, we promote coherent and smooth voxel-wise structure through (1) robust geometric initialization using a visual geometry model and (2) a spatial smoothness loss that enforces coherent relationships across parent-child and sibling voxel groups. Extensive experiments across various benchmarks show that our method achieves strong reconstruction accuracy while having consistently speedy convergence. The code will be made public.

ROOct 28, 2025
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration

Jongsuk Kim, Jaeyoung Lee, Gyojin Han et al.

Recent advancements in deep learning and the availability of high-quality real-world driving datasets have propelled end-to-end autonomous driving. Despite this progress, relying solely on real-world data limits the variety of driving scenarios for training. Synthetic scenario generation has emerged as a promising solution to enrich the diversity of training data; however, its application within E2E AD models remains largely unexplored. This is primarily due to the absence of a designated ego vehicle and the associated sensor inputs, such as camera or LiDAR, typically provided in real-world scenarios. To address this gap, we introduce SynAD, the first framework designed to enhance real-world E2E AD models using synthetic data. Our method designates the agent with the most comprehensive driving information as the ego vehicle in a multi-agent synthetic scenario. We further project path-level scenarios onto maps and employ a newly developed Map-to-BEV Network to derive bird's-eye-view features without relying on sensor inputs. Finally, we devise a training strategy that effectively integrates these map-based synthetic data with real driving data. Experimental results demonstrate that SynAD effectively integrates all components and notably enhances safety performance. By bridging synthetic scenario generation and E2E AD, SynAD paves the way for more comprehensive and robust autonomous driving models.

CVMay 24, 2023
FaceFusion: Exploiting Full Spectrum of Multiple Datasets

Chiyoung Song, Dongjae Lee

The size of training dataset is known to be among the most dominating aspects of training high-performance face recognition embedding model. Building a large dataset from scratch could be cumbersome and time-intensive, while combining multiple already-built datasets poses the risk of introducing large amount of label noise. We present a novel training method, named FaceFusion. It creates a fused view of different datasets that is untainted by identity conflicts, while concurrently training an embedding network using the view in an end-to-end fashion. Using the unified view of combined datasets enables the embedding network to be trained against the entire spectrum of the datasets, leading to a noticeable performance boost. Extensive experiments confirm superiority of our method, whose performance in public evaluation datasets surpasses not only that of using a single training dataset, but also that of previously known methods under various training circumstances.

ROJul 14, 2021
Robust and Recursively Feasible Real-Time Trajectory Planning in Unknown Environments

Inkyu Jang, Dongjae Lee, Seungjae Lee et al.

Motion planners for mobile robots in unknown environments face the challenge of simultaneously maintaining both robustness against unmodeled uncertainties and persistent feasibility of the trajectory-finding problem. That is, while dealing with uncertainties, a motion planner must update its trajectory, adapting to the newly revealed environment in real-time; failing to do so may involve unsafe circumstances. Many existing planning algorithms guarantee these by maintaining the clearance needed to perform an emergency brake, which is itself a robust and persistently feasible maneuver. However, such maneuvers are not applicable for systems in which braking is impossible or risky, such as fixed-wing aircraft. To that end, we propose a real-time robust planner that recursively guarantees persistent feasibility without any need of braking. The planner ensures robustness against bounded uncertainties and persistent feasibility by constructing a loop of sequentially composed funnels, starting from the receding horizon local trajectory's forward reachable set. We implement the proposed algorithm for a robotic car tracking a speed-fixed reference trajectory. The experiment results show that the proposed algorithm can be run at faster than 16 Hz, while successfully keeping the system away from entering any dead-end, to maintain safety and feasibility.

ROJul 6, 2021
Real-Time Motion Planning of a Hydraulic Excavator using Trajectory Optimization and Model Predictive Control

Dongjae Lee, Inkyu Jang, Jeonghyun Byun et al.

Automation of excavation tasks requires real-time trajectory planning satisfying various constraints. To guarantee both constraint feasibility and real-time trajectory re-plannability, we present an integrated framework for real-time optimization-based trajectory planning of a hydraulic excavator. The proposed framework is composed of two main modules: a global planner and a real-time local planner. The global planner computes the entire global trajectory considering excavation volume and energy minimization while the local counterpart tracks the global trajectory in a receding horizon manner, satisfying dynamic feasibility, physical constraints, and disturbance-awareness. We validate the proposed planning algorithm in a simulation environment where two types of operations are conducted in the presence of emulated disturbance from hydraulic friction and soil-bucket interaction: shallow and deep excavation. The optimized global trajectories are obtained in an order of a second, which is tracked by the local planner at faster than 30 Hz. To the best of our knowledge, this work presents the first real-time motion planning framework that satisfies constraints of a hydraulic excavator, such as force/torque, power, cylinder displacement, and flow rate limits.

ROJul 1, 2021
Stability and Robustness Analysis of Plug-Pulling using an Aerial Manipulator

Jeonghyun Byun, Dongjae Lee, Hoseong Seo et al.

In this paper, an autonomous aerial manipulation task of pulling a plug out of an electric socket is conducted, where maintaining the stability and robustness is challenging due to sudden disappearance of a large interaction force. The abrupt change in the dynamical model before and after the separation of the plug can cause destabilization or mission failure. To accomplish aerial plug-pulling, we employ the concept of hybrid automata to divide the task into three operative modes, i.e, wire-pulling, stabilizing, and free-flight. Also, a strategy for trajectory generation and a design of disturbance-observer-based controllers for each operative mode are presented. Furthermore, the theory of hybrid automata is used to prove the stability and robustness during the mode transition. We validate the proposed trajectory generation and control method by an actual wire-pulling experiment with a multirotor-based aerial manipulator.

ROMar 18, 2020
Aerial Manipulation using Model Predictive Control for Opening a Hinged Door

Dongjae Lee, Hoseong Seo, Dabin Kim et al.

Existing studies for environment interaction with an aerial robot have been focused on interaction with static surroundings. However, to fully explore the concept of an aerial manipulation, interaction with moving structures should also be considered. In this paper, a multirotor-based aerial manipulator opening a daily-life moving structure, a hinged door, is presented. In order to address the constrained motion of the structure and to avoid collisions during operation, model predictive control (MPC) is applied to the derived coupled system dynamics between the aerial manipulator and the door involving state constraints. By implementing a constrained version of differential dynamic programming (DDP), MPC can generate position setpoints to the disturbance observer (DOB)-based robust controller in real-time, which is validated by our experimental results.