CVJun 3Code
CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimationYurim Jeon, Dongseong Seo, Seung-Woo Seo
Cross-view geo-localization estimates the geographic location of a ground image by matching it against an aerial image database. Existing methods tackle this through either large-scale retrieval or precise pose estimation, but not both: retrieval-based methods enable wide-area search at the cost of localization accuracy, while pose estimation methods achieve high precision within only a narrow search space. Naively cascading these pipelines introduces error propagation and inconsistent feature representations. We formulate cross-view geo-localization as a unified problem requiring simultaneous city-scale retrieval and precise 3-DoF pose estimation. We propose CIPER (Cross-view Image-retrieval and Pose-estimation transformER), a single architecture that jointly performs both tasks through mutually beneficial feature learning. CIPER uses a shared transformer encoder with task-specific tokens to disentangle global retrieval features from spatial localization cues. To bridge the large domain gap between ground and aerial views, we introduce a two-way transformer pose decoder that uses ground features as spatial queries for bidirectional cross-attention. A set prediction strategy further enables stable 3-DoF regression under a unified multi-task objective. Experiments on VIGOR, KITTI, and Ford Multi-AV demonstrate competitive performance, especially under limited field-of-view and arbitrary orientation conditions. Code is available at https://github.com/yurimjeon1892/CIPER.
CVMay 28
From General Vision to Reliable Traversability Estimation: Adapting Vision Foundation Models for Unstructured Outdoor EnvironmentsJi-Hoon Hwang, Jisung Bae, Dong-Wook Kim et al.
Vision-based approaches have become the dominant paradigm for traversability estimation in unstructured outdoor environments, typically adapting vision foundation models (VFMs) via semantic segmentation supervision. However, this paradigm faces three fundamental challenges that undermine its reliability: the task-agnostic design of VFMs, the ambiguity of traversability annotations, and the discrepancy between semantic labels and physical safety. We propose Vision-to-Traversability Adaptation (ViTA), a framework that adapts VFMs for reliable traversability estimation, instantiated on SAM2. ViTA injects task-specific knowledge through learnable traversability prompts while preserving the VFM's cross-domain generalization. To handle annotation ambiguity, we introduce Perspective-Diversified Training, which estimates semantic uncertainty to suppress confident predictions at ambiguous boundaries. To bridge the semantic-traversability discrepancy, we distill geometric knowledge during training, enabling slope and elevation reasoning from RGB images alone at inference. The semantic and geometric outputs are fused into a continuous traversability score that reflects both semantic uncertainty and geometric risk. Evaluations across diverse domains, including challenging real-world off-road datasets, demonstrate that ViTA achieves state-of-the-art IoU and Precision with substantial false-positive reduction and strong cross-domain generalization.
ROMay 28
How to Relieve Distribution Shifts in Semantic Segmentation for Off-Road EnvironmentsJi-Hoon Hwang, Daeyoung Kim, Hyung-Suk Yoon et al.
Semantic segmentation is crucial for autonomous navigation in off-road environments, enabling precise classification of surroundings to identify traversable regions. However, distinctive factors inherent to off-road conditions, such as source-target domain discrepancies and sensor corruption from rough terrain, can result in distribution shifts that alter the data differently from the trained conditions. This often leads to inaccurate semantic label predictions and subsequent failures in navigation tasks. To address this, we propose ST-Seg, a novel framework that expands the source distribution through style expansion (SE) and texture regularization (TR). Unlike prior methods that implicitly apply generalization within a fixed source distribution, ST-Seg offers an intuitive approach for distribution shift. Specifically, SE broadens domain coverage by generating diverse realistic styles, augmenting the limited style information of the source domain. TR stabilizes local texture representation affected by style-augmented learning through a deep texture manifold. Experiments across various distribution-shifted target domains demonstrate the effectiveness of ST-Seg, with substantial improvements over existing methods. These results highlight the robustness of ST-Seg, enhancing the real-world applicability of semantic segmentation for off-road navigation.
LGNov 7, 2023
SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution SituationsChan Kim, Jaekyung Cho, Christophe Bobda et al.
Robotic agents trained using reinforcement learning have the problem of taking unreliable actions in an out-of-distribution (OOD) state. Agents can easily become OOD in real-world environments because it is almost impossible for them to visit and learn the entire state space during training. Unfortunately, unreliable actions do not ensure that agents perform their original tasks successfully. Therefore, agents should be able to recognize whether they are in OOD states and learn how to return to the learned state distribution rather than continue to take unreliable actions. In this study, we propose a novel method for retraining agents to recover from OOD situations in a self-supervised manner when they fall into OOD states. Our in-depth experimental results demonstrate that our method substantially improves the agent's ability to recover from OOD situations in terms of sample efficiency and restoration of the performance for the original tasks. Moreover, we show that our method can retrain the agent to recover from OOD situations even when in-distribution states are difficult to visit through exploration.
ROMay 25
How to Mitigate the Distribution Shift Problem in Robotics Control: A Robust and Adaptive Approach Based on Offline to Online Imitation LearningHyung-Suk Yoon, Seung-Woo Seo
Distribution shift in imitation learning refers to the problem that the agent cannot plan proper actions for a state that has not been visited during the training. This problem can be largely attributed to the inherently narrow state-action coverage provided by expert demonstrations over the full environment. In this paper, we propose a robust offline to adaptive online imitation learning framework that handles the distribution shift problem in a lifelong, multi-phase scheme. In the offline learning phase, we leverage supplementary demonstrations to broaden the state-action coverage of the policy by utilizing a discriminator to effectively train the policy with supplementary demonstrations, thereby enhancing the robustness of the policy to distribution shift. In the subsequent online inference phase, our framework detects the occurrence of distribution shift and conducts self-supervised imitation learning from online experiences to adapt the policy to the online environments. Through extensive evaluations in MuJoCo environments, we demonstrate that our method exhibits better robustness to distribution shift and better adaptation performance to online environments than the baseline algorithms, which indicates superior performance of our framework against the distribution shift.
ROJun 3, 2022
GIN: Graph-based Interaction-aware Constraint Policy Optimization for Autonomous DrivingSe-Wook Yoo, Chan Kim, Jin-Woo Choi et al.
Applying reinforcement learning to autonomous driving entails particular challenges, primarily due to dynamically changing traffic flows. To address such challenges, it is necessary to quickly determine response strategies to the changing intentions of surrounding vehicles. This paper proposes a new policy optimization method for safe driving using graph-based interaction-aware constraints. In this framework, the motion prediction and control modules are trained simultaneously while sharing a latent representation that contains a social context. To reflect social interactions, we illustrate the movements of agents in graph form and filter the features with the graph convolution networks. This helps preserve the spatiotemporal locality of adjacent nodes. Furthermore, we create feedback loops to combine these two modules effectively. As a result, this approach encourages the learned controller to be safe from dynamic risks and renders the motion prediction robust to abnormal movements. In the experiment, we set up a navigation scenario comprising various situations with CARLA, an urban driving simulator. The experiments show state-of-the-art performance on navigation strategy and motion prediction compared to the baselines.
ROSep 16, 2024
E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language ModelsChan Kim, Keonwoo Kim, Mintaek Oh et al.
Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stochastic, initial plans based solely on LLMs' general knowledge may fail to achieve their objectives, unlike in static scenarios. To address this limitation, this study introduces the Experience-and-Emotion Map (E2Map), which integrates not only LLM knowledge but also the agent's real-world experiences, drawing inspiration from human emotional responses. The proposed methodology enables one-shot behavior adjustments by updating the E2Map based on the agent's experiences. Our evaluation in stochastic navigation environments, including both simulations and real-world scenarios, demonstrates that the proposed method significantly enhances performance in stochastic environments compared to existing LLM-based approaches. Code and supplementary materials are available at https://e2map.github.io/.
LGJun 19, 2022
Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement LearningSe-Wook Yoo, Seung-Woo Seo
Many robotic tasks are composed of a lot of temporally correlated sub-tasks in a highly complex environment. It is important to discover situational intentions and proper actions by deliberating on temporal abstractions to solve problems effectively. To understand the intention separated from changing task dynamics, we extend an empowerment-based regularization technique to situations with multiple tasks based on the framework of a generative adversarial network. Under the multitask environments with unknown dynamics, we focus on learning a reward and policy from the unlabeled expert examples. In this study, we define situational empowerment as the maximum of mutual information representing how an action conditioned on both a certain state and sub-task affects the future. Our proposed method derives the variational lower bound of the situational mutual information to optimize it. We simultaneously learn the transferable multi-task reward function and policy by adding an induced term to the objective function. By doing so, the multi-task reward function helps to learn a robust policy for environmental change. We validate the advantages of our approach on multi-task learning and multi-task transfer learning. We demonstrate our proposed method has the robustness of both randomness and changing task dynamics. Finally, we prove that our method has significantly better performance and data efficiency than existing imitation learning methods on various benchmarks.
LGNov 17, 2023
Imagination-Augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban EnvironmentsSang-Hyun Lee, Yoonjae Jung, Seung-Woo Seo
Hierarchical reinforcement learning (HRL) incorporates temporal abstraction into reinforcement learning (RL) by explicitly taking advantage of hierarchical structure. Modern HRL typically designs a hierarchical agent composed of a high-level policy and low-level policies. The high-level policy selects which low-level policy to activate at a lower frequency and the activated low-level policy selects an action at each time step. Recent HRL algorithms have achieved performance gains over standard RL algorithms in synthetic navigation tasks. However, we cannot apply these HRL algorithms to real-world navigation tasks. One of the main challenges is that real-world navigation tasks require an agent to perform safe and interactive behaviors in dynamic environments. In this paper, we propose imagination-augmented HRL (IAHRL) that efficiently integrates imagination into HRL to enable an agent to learn safe and interactive behaviors in real-world navigation tasks. Imagination is to predict the consequences of actions without interactions with actual environments. The key idea behind IAHRL is that the low-level policies imagine safe and structured behaviors, and then the high-level policy infers interactions with surrounding objects by interpreting the imagined behaviors. We also introduce a new attention mechanism that allows our high-level policy to be permutation-invariant to the order of surrounding objects and to prioritize our agent over them. To evaluate IAHRL, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that IAHRL enables an agent to perform safe and interactive behaviors, achieving higher success rates and lower average episode steps than baselines.
LGNov 15, 2023
Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific KnowledgeSang-Hyun Lee, Seung-Woo Seo
A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation and manipulation tasks, outperforming baselines with significantly fewer manual resets.
LGFeb 3
Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RLJinwoo Choi, Sang-Hyun Lee, Seung-Woo Seo
Offline goal-conditioned reinforcement learning remains challenging for long-horizon tasks. While hierarchical approaches mitigate this issue by decomposing tasks, most existing methods rely on separate high- and low-level networks and generate only a single intermediate subgoal, making them inadequate for complex tasks that require coordinating multiple intermediate decisions. To address this limitation, we draw inspiration from the chain-of-thought paradigm and propose the Chain-of-Goals Hierarchical Policy (CoGHP), a novel framework that reformulates hierarchical decision-making as autoregressive sequence modeling within a unified architecture. Given a state and a final goal, CoGHP autoregressively generates a sequence of latent subgoals followed by the primitive action, where each latent subgoal acts as a reasoning step that conditions subsequent predictions. To implement this efficiently, we pioneer the use of an MLP-Mixer backbone, which supports cross-token communication and captures structural relationships among state, goal, latent subgoals, and action. Across challenging navigation and manipulation benchmarks, CoGHP consistently outperforms strong offline baselines, demonstrating improved performance on long-horizon tasks.
ROMay 22, 2024
Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human InterventionSang-Hyun Lee, Daehyeok Kwon, Seung-Woo Seo
Recent reinforcement learning (RL) algorithms have demonstrated impressive results in simulated driving environments. However, autonomous vehicles trained in simulation often struggle to work well in the real world due to the fidelity gap between simulated and real-world environments. While directly training real-world autonomous vehicles with RL algorithms is a promising approach to bypass the fidelity gap problem, it presents several challenges. One critical yet often overlooked challenge is the need to reset a driving environment between every episode. This reset process demands significant human intervention, leading to poor training efficiency in the real world. In this paper, we introduce a novel autonomous algorithm that enables off-the-shelf RL algorithms to train autonomous vehicles with minimal human intervention. Our algorithm reduces unnecessary human intervention by aborting episodes to prevent unsafe states and identifying informative initial states for subsequent episodes. The key idea behind identifying informative initial states is to estimate the expected amount of information that can be obtained from under-explored but reachable states. Our algorithm also revisits rule-based autonomous driving algorithms and highlights their benefits in safely returning an autonomous vehicle to initial states. To evaluate how much human intervention is required during training, we implement challenging urban driving tasks that require an autonomous vehicle to reset to initial states on its own. The experimental results show that our autonomous algorithm is task-agnostic and achieves competitive driving performance with much less human intervention than baselines.
ROMar 21, 2025
LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement LearningChan Kim, Seung-Woo Seo, Seong-Woo Kim
Deep Reinforcement Learning (DRL) has demonstrated strong performance in robotic control but remains susceptible to out-of-distribution (OOD) states, often resulting in unreliable actions and task failure. While previous methods have focused on minimizing or preventing OOD occurrences, they largely neglect recovery once an agent encounters such states. Although the latest research has attempted to address this by guiding agents back to in-distribution states, their reliance on uncertainty estimation hinders scalability in complex environments. To overcome this limitation, we introduce Language Models for Out-of-Distribution Recovery (LaMOuR), which enables recovery learning without relying on uncertainty estimation. LaMOuR generates dense reward codes that guide the agent back to a state where it can successfully perform its original task, leveraging the capabilities of LVLMs in image description, logical reasoning, and code generation. Experimental results show that LaMOuR substantially enhances recovery efficiency across diverse locomotion tasks and even generalizes effectively to complex environments, including humanoid locomotion and mobile manipulation, where existing methods struggle. The code and supplementary materials are available at https://lamour-rl.github.io/.
CVAug 6, 2025
Radar-Based NLoS Pedestrian Localization for Darting-Out Scenarios Near Parked Vehicles with Camera-Assisted Point Cloud InterpretationHee-Yeun Kim, Byeonggyu Park, Byonghyok Choi et al.
The presence of Non-Line-of-Sight (NLoS) blind spots resulting from roadside parking in urban environments poses a significant challenge to road safety, particularly due to the sudden emergence of pedestrians. mmWave technology leverages diffraction and reflection to observe NLoS regions, and recent studies have demonstrated its potential for detecting obscured objects. However, existing approaches predominantly rely on predefined spatial information or assume simple wall reflections, thereby limiting their generalizability and practical applicability. A particular challenge arises in scenarios where pedestrians suddenly appear from between parked vehicles, as these parked vehicles act as temporary spatial obstructions. Furthermore, since parked vehicles are dynamic and may relocate over time, spatial information obtained from satellite maps or other predefined sources may not accurately reflect real-time road conditions, leading to erroneous sensor interpretations. To address this limitation, we propose an NLoS pedestrian localization framework that integrates monocular camera image with 2D radar point cloud (PCD) data. The proposed method initially detects parked vehicles through image segmentation, estimates depth to infer approximate spatial characteristics, and subsequently refines this information using 2D radar PCD to achieve precise spatial inference. Experimental evaluations conducted in real-world urban road environments demonstrate that the proposed approach enhances early pedestrian detection and contributes to improved road safety. Supplementary materials are available at https://hiyeun.github.io/NLoS/.
LGApr 21, 2025
Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length AdjustmentJinwoo Choi, Seung-Woo Seo
Reinforcement learning (RL) has made significant progress in various domains, but scaling it to long-horizon tasks with complex decision-making remains challenging. Skill learning attempts to address this by abstracting actions into higher-level behaviors. However, current approaches often fail to recognize semantically similar behaviors as the same skill and use fixed skill lengths, limiting flexibility and generalization. To address this, we propose Dynamic Contrastive Skill Learning (DCSL), a novel framework that redefines skill representation and learning. DCSL introduces three key ideas: state-transition based skill representation, skill similarity function learning, and dynamic skill length adjustment. By focusing on state transitions and leveraging contrastive learning, DCSL effectively captures the semantic context of behaviors and adapts skill lengths to match the appropriate temporal extent of behaviors. Our approach enables more flexible and adaptive skill extraction, particularly in complex or noisy datasets, and demonstrates competitive performance compared to existing methods in task completion and efficiency.
LGJan 30, 2025
DIAL: Distribution-Informed Adaptive Learning of Multi-Task Constraints for Safety-Critical SystemsSe-Wook Yoo, Seung-Woo Seo
Safe reinforcement learning has traditionally relied on predefined constraint functions to ensure safety in complex real-world tasks, such as autonomous driving. However, defining these functions accurately for varied tasks is a persistent challenge. Recent research highlights the potential of leveraging pre-acquired task-agnostic knowledge to enhance both safety and sample efficiency in related tasks. Building on this insight, we propose a novel method to learn shared constraint distributions across multiple tasks. Our approach identifies the shared constraints through imitation learning and then adapts to new tasks by adjusting risk levels within these learned distributions. This adaptability addresses variations in risk sensitivity stemming from expert-specific biases, ensuring consistent adherence to general safety principles even with imperfect demonstrations. Our method can be applied to control and navigation domains, including multi-task and meta-task scenarios, accommodating constraints such as maintaining safe distances or adhering to speed limits. Experimental results validate the efficacy of our approach, demonstrating superior safety performance and success rates compared to baselines, all without requiring task-specific constraint definitions. These findings underscore the versatility and practicality of our method across a wide range of real-world tasks.
LGJan 6, 2024
Interpreting Adaptive Gradient Methods by Parameter Scaling for Learning-Rate-Free OptimizationMin-Kook Suh, Seung-Woo Seo
We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent. However, although steepest descent methods offer an intuitive approach to finding minima, many deep learning applications require adaptive gradient methods to achieve faster convergence. In this paper, we interpret adaptive gradient methods as steepest descent applied on parameter-scaled networks, proposing learning-rate-free adaptive gradient methods. Experimental results verify the effectiveness of this approach, demonstrating comparable performance to hand-tuned learning rates across various scenarios. This work extends the applicability of learning-rate-free methods, enhancing training with adaptive gradient methods.
LGMay 2, 2023
Long-Tailed Recognition by Mutual Information Maximization between Latent Features and Ground-Truth LabelsMin-Kook Suh, Seung-Woo Seo
Although contrastive learning methods have shown prevailing performance on a variety of representation learning tasks, they encounter difficulty when the training dataset is long-tailed. Many researchers have combined contrastive learning and a logit adjustment technique to address this problem, but the combinations are done ad-hoc and a theoretical background has not yet been provided. The goal of this paper is to provide the background and further improve the performance. First, we show that the fundamental reason contrastive learning methods struggle with long-tailed tasks is that they try to maximize the mutual information maximization between latent features and input data. As ground-truth labels are not considered in the maximization, they are not able to address imbalances between class labels. Rather, we interpret the long-tailed recognition task as a mutual information maximization between latent features and ground-truth labels. This approach integrates contrastive learning and logit adjustment seamlessly to derive a loss function that shows state-of-the-art performance on long-tailed recognition benchmarks. It also demonstrates its efficacy in image segmentation tasks, verifying its versatility beyond image classification.
LGMay 3, 2021
RetCL: A Selection-based Approach for Retrosynthesis via Contrastive LearningHankook Lee, Sungsoo Ahn, Seung-Woo Seo et al.
Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RetCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selection-based approach. For example, when all 671k reactants in the USPTO {database} are given as candidates, our RetCL achieves top-1 exact match accuracy of $71.3\%$ for the USPTO-50k benchmark, while a recent transformer-based approach achieves $59.6\%$. We also demonstrate that RetCL generalizes well to unseen templates in various settings in contrast to template-based approaches.
RONov 24, 2020
A Robotic Dating Coaching System Leveraging Online Communities PostsSihyeon Jo, Donghwi Jung, Keonwoo Kim et al.
Can a robot be a personal dating coach? Even with the increasing amount of conversational data on the internet, the implementation of conversational robots remains a challenge. In particular, a detailed and professional counseling log is expensive and not publicly accessible. In this paper, we develop a robot dating coaching system leveraging corpus from online communities. We examine people's perceptions of the dating coaching robot with a dialogue module. 97 participants joined to have a conversation with the robot, and 30 of them evaluated the robot. The results indicate that participants thought the robot could become a dating coach while considering the robot is entertaining rather than helpful.