Daeun Song

RO
h-index18
5papers
80citations
Novelty56%
AI Score48

5 Papers

ROMay 2
AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning

Yangzhe Kong, Daeun Song, Jing Liang et al.

We present a novel method, AutoSpatial, an efficient approach with structured spatial grounding to enhance VLMs' spatial reasoning. By combining minimal manual supervision with large-scale Visual Question-Answering (VQA) pairs auto-labeling, our approach tackles the challenge of VLMs' limited spatial understanding in social navigation tasks. By applying a hierarchical two-round VQA strategy during training, AutoSpatial achieves both global and detailed understanding of scenarios, demonstrating more accurate spatial perception, movement prediction, Chain of Thought (CoT) reasoning, final action, and explanation compared to other SOTA approaches. These five components are essential for comprehensive social navigation reasoning. Our approach was evaluated using both expert systems (GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet) that provided cross-validation scores and human evaluators who assigned relative rankings to compare model performances across four key aspects. Augmented by the enhanced spatial reasoning capabilities, AutoSpatial demonstrates substantial improvements by averaged cross-validation score from expert systems in: perception & prediction (up to 10.71%), reasoning (up to 16.26%), action (up to 20.50%), and explanation (up to 18.73%) compared to baseline models trained only on manually annotated data.

ROMar 21Code
E-SocialNav: Efficient Socially Compliant Navigation with Language Models

Ling Xiao, Daeun Song, Xuesu Xiao et al.

Language models (LMs) are increasingly applied to robotic navigation; however, existing benchmarks primarily emphasize navigation success rates while paying limited attention to social compliance. Moreover, relying on large-scale LMs can raise efficiency concerns, as their heavy computational overhead leads to slower response times and higher energy consumption, making them impractical for real-time deployment on resource-constrained robotic platforms. In this work, we evaluate the social compliance of GPT-4o and Claude in robotic navigation and propose E-SocialNav, an efficient LM designed for socially compliant navigation. Despite being trained on a relatively small dataset, E-SocialNav consistently outperforms zero-shot baselines in generating socially compliant behaviors. By employing a two-stage training pipeline consisting of supervised fine-tuning followed by direct preference optimization, E-SocialNav achieves strong performance in both text-level semantic similarity to human annotations and action accuracy. The source code is available at https://github.com/Dr-LingXiao/ESocialNav.

CVApr 4, 2024Code
AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

Tianrui Guan, Ruiqi Xian, Xijun Wang et al.

We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. AGL-NET tackles two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view. To address these challenges, AGL-NET leverages a unified network architecture with a novel two-stage matching design. The first stage extracts informative neural features directly from raw sensor data and performs initial feature matching. The second stage refines this matching process by extracting informative skeleton features and incorporating a novel scale alignment step to rectify scale variations between LiDAR and map data. Furthermore, a novel scale and skeleton loss function guides the network toward learning scale-invariant feature representations, eliminating the need for pre-processing satellite maps. This significantly improves real-world applicability in scenarios with unknown map scales. To facilitate rigorous performance evaluation, we introduce a meticulously designed dataset within the CARLA simulator specifically tailored for metric localization training and assessment. The code and data can be accessed at https://github.com/rayguan97/AGL-Net.

RONov 19, 2020
Solving Footstep Planning as a Feasibility Problem using L1-norm Minimization (Extended Version)

Daeun Song, Pierre Fernbach, Thomas Flayols et al.

One challenge of legged locomotion on uneven terrains is to deal with both the discrete problem of selecting a contact surface for each footstep and the continuous problem of placing each footstep on the selected surface. Consequently, footstep planning can be addressed with a Mixed Integer Program (MIP), an elegant but computationally-demanding method, which can make it unsuitable for online planning. We reformulate the MIP into a cardinality problem, then approximate it as a computationally efficient l1-norm minimisation, called SL1M. Moreover, we improve the performance and convergence of SL1M by combining it with a sampling-based root trajectory planner to prune irrelevant surface candidates. Our tests on the humanoid Talos in four representative scenarios show that SL1M always converges faster than MIP. For scenarios when the combinatorial complexity is small (< 10 surfaces per step), SL1M converges at least two times faster than MIP with no need for pruning. In more complex cases, SL1M converges up to 100 times faster than MIP with the help of pruning. Moreover, pruning can also improve the MIP computation time. The versatility of the framework is shown with additional tests on the quadruped robot ANYmal.

ROSep 19, 2019
SL1M: Sparse L1-norm Minimization for contact planning on uneven terrain

Steve Tonneau, Daeun Song, Pierre Fernbach et al.

One of the main challenges of planning legged locomotion in complex environments is the combinatorial contact selection problem. Recent contributions propose to use integer variables to represent which contact surface is selected, and then to rely on modern mixed-integer (MI) optimization solvers to handle this combinatorial issue. To reduce the computational cost of MI, we exploit the sparsity properties of L1 norm minimization techniques to relax the contact planning problem into a feasibility linear program. Our approach accounts for kinematic reachability of the center of mass (COM) and of the contact effectors. We ensure the existence of a quasi-static COM trajectory by restricting our plan to quasi-flat contacts. For planning 10 steps with less than 10 potential contact surfaces for each phase, our approach is 50 to 100 times faster that its MI counterpart, which suggests potential applications for online contact re-planning. The method is demonstrated in simulation with the humanoid robots HRP-2 and Talos over various scenarios.