SYJul 5, 2018
An integrated localization-navigation scheme for distance-based docking of UAVsThien-Minh Nguyen, Zhirong Qiu, Muqing Cao et al.
In this paper we study the distance-based docking problem of unmanned aerial vehicles (UAVs) by using a single landmark placed at an arbitrarily unknown position. To solve the problem, we propose an integrated estimation-control scheme to simultaneously achieve the relative localization and navigation tasks for discrete-time integrators under bounded velocity: a nonlinear adaptive estimation scheme to estimate the relative position to the landmark, and a delicate control scheme to ensure both the convergence of the estimation and the asymptotic docking at the given landmark. A rigorous proof of convergence is provided by invoking the discrete-time LaSalle's invariance principle, and we also validate our theoretical findings on quadcopters equipped with ultra-wideband ranging sensors and optical flow sensors in a GPS-less environment.
66.9ROMar 18
OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground PlatformsZhongyuang Liu, Min He, Shaonan Yu et al.
Language-guided embodied navigation requires an agent to interpret object-referential instructions, search across multiple rooms, localize the referenced target, and execute reliable motion toward it. Existing systems remain limited in real indoor environments because narrow field-of-view sensing exposes only a partial local scene at each step, often forcing repeated rotations, delaying target discovery, and producing fragmented spatial understanding; meanwhile, directly prompting LLMs with dense 3D maps or exhaustive object lists quickly exceeds the context budget. We present OmniVLN, a zero-shot visual-language navigation framework that couples omnidirectional 3D perception with token-efficient hierarchical reasoning for both aerial and ground robots. OmniVLN fuses a rotating LiDAR and panoramic vision into a hardware-agnostic mapping stack, incrementally constructs a five-layer Dynamic Scene Graph (DSG) from mesh geometry to room- and building-level structure, and stabilizes high-level topology through persistent-homology-based room partitioning and hybrid geometric/VLM relation verification. For navigation, the global DSG is transformed into an agent-centric 3D octant representation with multi-resolution spatial attention prompting, enabling the LLM to progressively filter candidate rooms, infer egocentric orientation, localize target objects, and emit executable navigation primitives while preserving fine local detail and compact long-range memory. Experiments show that the proposed hierarchical interface improves spatial referring accuracy from 77.27\% to 93.18\%, reduces cumulative prompt tokens by up to 61.7\% in cluttered multi-room settings, and improves navigation success by up to 11.68\% over a flat-list baseline. We will release the code and an omnidirectional multimodal dataset to support reproducible research.
44.7ROApr 1
IA-TIGRIS: An Incremental and Adaptive Sampling-Based Planner for Online Informative Path PlanningBrady Moon, Nayana Suvarna, Andrew Jong et al.
Planning paths that maximize information gain for robotic platforms has wide-ranging applications and significant potential impact. To effectively adapt to real-time data collection, informative path planning must be computed online and be responsive to new observations. In this work, we present IA-TIGRIS (Incremental and Adaptive Tree-based Information Gathering Using Informed Sampling), which is an incremental and adaptive sampling-based informative path planner designed for real-time onboard execution. Our approach leverages past planning efforts through incremental refinement while continuously adapting to updated belief maps. We additionally present detailed implementation and optimization insights to facilitate real-world deployment, along with an array of reward functions tailored to specific missions and behaviors. Extensive simulation results demonstrate IA-TIGRIS generates higher-quality paths compared to baseline methods. We validate our planner on two distinct hardware platforms: a hexarotor unmanned aerial vehicle (UAV) and a fixed-wing UAV, each having different motion models and configuration spaces. Our results show up to a 38% improvement in information gain compared to baseline methods, highlighting the planner's potential for deployment in real-world applications. Project website: https://ia-tigris.github.io
62.0ROMar 16
Topological Motion Planning Diffusion: Generative Tangle-Free Path Planning for Tethered Robots in Obstacle-Rich EnvironmentsYifu Tian, Xinhang Xu, Thien-Minh Nguyen et al.
In extreme environments such as underwater exploration and post-disaster rescue, tethered robots require continuous navigation while avoiding cable entanglement. Traditional planners struggle in these lifelong planning scenarios due to topological unawareness, while topology-augmented graph-search methods face computational bottlenecks in obstacle-rich environments where the number of candidate topological classes increases. To address these challenges, we propose Topological Motion Planning Diffusion (TMPD), a novel generative planning framework that integrates lifelong topological memory. Instead of relying on sequential graph search, TMPD leverages a diffusion model to propose a multimodal front-end of kinematically feasible trajectory candidates across various homotopy classes. A tether-aware topological back-end then filters and optimizes these candidates by computing generalized winding numbers to evaluate their topological energy against the accumulated tether configuration. Benchmarking in obstacle-rich simulated environments demonstrates that TMPD achieves a collision-free reach of 100% and a tangle-free rate of 97.0%, outperforming traditional topological search and purely kinematic diffusion baselines in both geometric smoothness and computational efficiency. Simulation with realistic cable dynamics further validates the practicality of the proposed approach.
30.7ROMar 13
Learning Energy-Efficient Air--Ground Actuation for Hybrid Robots on Stair-Like TerrainJiaxing Li, Wen Tian, Xinhang Xu et al.
Hybrid aerial--ground robots offer both traversability and endurance, but stair-like discontinuities create a trade-off: wheels alone often stall at edges, while flight is energy-hungry for small height gains. We propose an energy-aware reinforcement learning framework that trains a single continuous policy to coordinate propellers, wheels, and tilt servos without predefined aerial and ground modes. We train policies from proprioception and a local height scan in Isaac Lab with parallel environments, using hardware-calibrated thrust/power models so the reward penalizes true electrical energy. The learned policy discovers thrust-assisted driving that blends aerial thrust and ground traction. In simulation it achieves about 4 times lower energy than propeller-only control. We transfer the policy to a DoubleBee prototype on an 8cm gap-climbing task; it achieves 38% lower average power than a rule-based decoupled controller. These results show that efficient hybrid actuation can emerge from learning and deploy on hardware.
81.0ROMar 13
UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor PoliciesHarsh Gupta, Xiaofeng Guo, Huy Ha et al.
We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, checkpoints, and result videos can be found at umi-on-air.github.io.
OCSep 10, 2021Code
DIRECT: A Differential Dynamic Programming Based Framework for Trajectory GenerationKun Cao, Muqing Cao, Shenghai Yuan et al.
This paper introduces a differential dynamic programming (DDP) based framework for polynomial trajectory generation for differentially flat systems. In particular, instead of using a linear equation with increasing size to represent multiple polynomial segments as in literature, we take a new perspective from state-space representation such that the linear equation reduces to a finite horizon control system with a fixed state dimension and the required continuity conditions for consecutive polynomials are automatically satisfied. Consequently, the constrained trajectory generation problem (both with and without time optimization) can be converted to a discrete-time finite-horizon optimal control problem with inequality constraints, which can be approached by a recently developed interior-point DDP (IPDDP) algorithm. Furthermore, for unconstrained trajectory generation with preallocated time, we show that this problem is indeed a linear-quadratic tracking (LQT) problem (DDP algorithm with exact one iteration). All these algorithms enjoy linear complexity with respect to the number of segments. Both numerical comparisons with state-of-the-art methods and physical experiments are presented to verify and validate the effectiveness of our theoretical findings. The implementation code will be open-sourced,
CVSep 14, 2025
Beyond Frame-wise Tracking: A Trajectory-based Paradigm for Efficient Point Cloud TrackingBaiChen Fan, Sifan Zhou, Jian Li et al.
LiDAR-based 3D single object tracking (3D SOT) is a critical task in robotics and autonomous systems. Existing methods typically follow frame-wise motion estimation or a sequence-based paradigm. However, the two-frame methods are efficient but lack long-term temporal context, making them vulnerable in sparse or occluded scenes, while sequence-based methods that process multiple point clouds gain robustness at a significant computational cost. To resolve this dilemma, we propose a novel trajectory-based paradigm and its instantiation, TrajTrack. TrajTrack is a lightweight framework that enhances a base two-frame tracker by implicitly learning motion continuity from historical bounding box trajectories alone-without requiring additional, costly point cloud inputs. It first generates a fast, explicit motion proposal and then uses an implicit motion modeling module to predict the future trajectory, which in turn refines and corrects the initial proposal. Extensive experiments on the large-scale NuScenes benchmark show that TrajTrack achieves new state-of-the-art performance, dramatically improving tracking precision by 4.48% over a strong baseline while running at 56 FPS. Besides, we also demonstrate the strong generalizability of TrajTrack across different base trackers. Video is available at https://www.bilibili.com/video/BV1ahYgzmEWP.
ROFeb 1, 2022
NTU VIRAL: A Visual-Inertial-Ranging-Lidar Dataset, From an Aerial Vehicle ViewpointThien-Minh Nguyen, Shenghai Yuan, Muqing Cao et al.
In recent years, autonomous robots have become ubiquitous in research and daily life. Among many factors, public datasets play an important role in the progress of this field, as they waive the tall order of initial investment in hardware and manpower. However, for research on autonomous aerial systems, there appears to be a relative lack of public datasets on par with those used for autonomous driving and ground robots. Thus, to fill in this gap, we conduct a data collection exercise on an aerial platform equipped with an extensive and unique set of sensors: two 3D lidars, two hardware-synchronized global-shutter cameras, multiple Inertial Measurement Units (IMUs), and especially, multiple Ultra-wideband (UWB) ranging units. The comprehensive sensor suite resembles that of an autonomous driving car, but features distinct and challenging characteristics of aerial operations. We record multiple datasets in several challenging indoor and outdoor conditions. Calibration results and ground truth from a high-accuracy laser tracker are also included in each package. All resources can be accessed via our webpage https://ntu-aris.github.io/ntu_viral_dataset.
ROMay 7, 2021
VIRAL SLAM: Tightly Coupled Camera-IMU-UWB-Lidar SLAMThien-Minh Nguyen, Shenghai Yuan, Muqing Cao et al.
In this paper, we propose a tightly-coupled, multi-modal simultaneous localization and mapping (SLAM) framework, integrating an extensive set of sensors: IMU, cameras, multiple lidars, and Ultra-wideband (UWB) range measurements, hence referred to as VIRAL (visual-inertial-ranging-lidar) SLAM. To achieve such a comprehensive sensor fusion system, one has to tackle several challenges such as data synchronization, multi-threading programming, bundle adjustment (BA), and conflicting coordinate frames between UWB and the onboard sensors, so as to ensure real-time localization and smooth updates in the state estimates. To this end, we propose a two stage approach. In the first stage, lidar, camera, and IMU data on a local sliding window are processed in a core odometry thread. From this local graph, new key frames are evaluated for admission to a global map. Visual feature-based loop closure is also performed to supplement the global factor graph with loop constraints. When the global factor graph satisfies a condition on spatial diversity, the BA process will be triggered to update the coordinate transform between UWB and onboard SLAM systems. The system then seamlessly transitions to the second stage where all sensors are tightly integrated in the odometry thread. The capability of our system is demonstrated via several experiments on high-fidelity graphical-physical simulation and public datasets.
ROApr 24, 2021
MILIOM: Tightly Coupled Multi-Input Lidar-Inertia Odometry and MappingThien-Minh Nguyen, Shenghai Yuan, Muqing Cao et al.
In this letter we investigate a tightly coupled Lidar-Inertia Odometry and Mapping (LIOM) scheme, with the capability to incorporate multiple lidars with complementary field of view (FOV). In essence, we devise a time-synchronized scheme to combine extracted features from separate lidars into a single pointcloud, which is then used to construct a local map and compute the feature-map matching (FMM) coefficients. These coefficients, along with the IMU preinteration observations, are then used to construct a factor graph that will be optimized to produce an estimate of the sliding window trajectory. We also propose a key frame-based map management strategy to marginalize certain poses and pointclouds in the sliding window to grow a global map, which is used to assemble the local map in the later stage. The use of multiple lidars with complementary FOV and the global map ensures that our estimate has low drift and can sustain good localization in situations where single lidar use gives poor result, or even fails to work. Multi-thread computation implementations are also adopted to fractionally cut down the computation time and ensure real-time performance. We demonstrate the efficacy of our system via a series of experiments on public datasets collected from an aerial vehicle.
ROFeb 2, 2021
Vision Based Autonomous UAV Plane Estimation And Following for Building InspectionYang Lyu, Muqing Cao, Shenghai Yuan et al.
Unmanned Aerial Vehicle (UAV) has already demonstrated its potential in many civilian applications, and the façade inspection is among the most promising ones. In this paper, we focus on enabling the autonomous perception and control of a small UAV for a façade inspection task. Specifically, we consider the perception as a planar object pose estimation problem by simplifying the building structure as concatenation of planes, and the control as an optimal reference tracking control problem. First, a vision based adaptive observer is proposed which can realize stable plane pose estimation under very mild observation conditions. Second, a model predictive controller is designed to achieve stable tracking and smooth transition in a multi-plane scenario, while the persistent excitation (PE) condition of the observer and the maneuver constraints of the UAV are satisfied. The proposed autonomous plane pose estimation and plane tracking methods are tested in both simulation and practical building fasçade inspection scenarios, which demonstrate their effectiveness and practicability.
RODec 28, 2020
SPINS: Structure Priors aided Inertial Navigation SystemYang Lyu, Thien-Minh Nguyen, Liu Liu et al.
Although Simultaneous Localization and Mapping (SLAM) has been an active research topic for decades, current state-of-the-art methods still suffer from instability or inaccuracy due to feature insufficiency or its inherent estimation drift, in many civilian environments. To resolve these issues, we propose a navigation system combing the SLAM and prior-map-based localization. Specifically, we consider additional integration of line and plane features, which are ubiquitous and more structurally salient in civilian environments, into the SLAM to ensure feature sufficiency and localization robustness. More importantly, we incorporate general prior map information into the SLAM to restrain its drift and improve the accuracy. To avoid rigorous association between prior information and local observations, we parameterize the prior knowledge as low dimensional structural priors defined as relative distances/angles between different geometric primitives. The localization is formulated as a graph-based optimization problem that contains sliding-window-based variables and factors, including IMU, heterogeneous features, and structure priors. We also derive the analytical expressions of Jacobians of different factors to avoid the automatic differentiation overhead. To further alleviate the computation burden of incorporating structural prior factors, a selection mechanism is adopted based on the so-called information gain to incorporate only the most effective structure priors in the graph optimization. Finally, the proposed framework is extensively tested on synthetic data, public datasets, and, more importantly, on the real UAV flight data obtained from a building inspection task. The results show that the proposed scheme can effectively improve the accuracy and robustness of localization for autonomous robots in civilian applications.
ROOct 25, 2020
LIRO: Tightly Coupled Lidar-Inertia-Ranging OdometryThien-Minh Nguyen, Muqing Cao, Shenghai Yuan et al.
In recent years, thanks to the continuously reduced cost and weight of 3D Lidar, the applications of this type of sensor in robotics community have become increasingly popular. Despite many progresses, estimation drift and tracking loss are still prevalent concerns associated with these systems. However, in theory these issues can be resolved with the use of some observations to fixed landmarks in the environments. This motivates us to investigate a tightly coupled sensor fusion scheme of Ultra-Wideband (UWB) range measurements with Lidar and inertia measurements. First, data from IMU, Lidar and UWB are associated with the robot's states on a sliding windows based on their timestamps. Then, we construct a cost function comprising of factors from UWB, Lidar and IMU preintegration measurements. Finally an optimization process is carried out to estimate the robot's position and orientation. Via some real world experiments, we show that the method can effectively resolve the drift issue, while only requiring two or three anchors deployed in the environment.
ROOct 23, 2020
VIRAL-Fusion: A Visual-Inertial-Ranging-Lidar Sensor Fusion ApproachThien-Minh Nguyen, Shenghai Yuan, Muqing Cao et al.
In recent years, Onboard Self Localization (OSL) methods based on cameras or Lidar have achieved many significant progresses. However, some issues such as estimation drift and feature-dependence still remain inherent limitations. On the other hand, infrastructure-based methods can generally overcome these issues, but at the expense of some installation cost. This poses an interesting problem of how to effectively combine these methods, so as to achieve localization with long-term consistency as well as flexibility compared to any single method. To this end, we propose a comprehensive optimization-based estimator for 15-dimensional state of an Unmanned Aerial Vehicle (UAV), fusing data from an extensive set of sensors: inertial measurement units (IMUs), Ultra-Wideband (UWB) ranging sensors, and multiple onboard Visual-Inertial and Lidar odometry subsystems. In essence, a sliding window is used to formulate a sequence of robot poses, where relative rotational and translational constraints between these poses are observed in the IMU preintegration and OSL observations, while orientation and position are coupled in body-offset UWB range observations. An optimization-based approach is developed to estimate the trajectory of the robot in this sliding window. We evaluate the performance of the proposed scheme in multiple scenarios, including experiments on public datasets, high-fidelity graphical-physical simulator, and field-collected data from UAV flight tests. The result demonstrates that our integrated localization method can effectively resolve the drift issue, while incurring minimal installation requirements.
ROFeb 25, 2020
Feasible Computationally Efficient Path Planning for UAV Collision AvoidanceHan Wang, Muqing Cao, Hao Jiang et al.
This paper presents a robust computationally efficient real-time collision avoidance algorithm for Unmanned Aerial Vehicle (UAV), namely Memory-based Wall Following-Artificial Potential Field (MWF-APF) method. The new algorithm switches between Wall-Following Method (WFM) and Artificial Potential Field method (APF) with improved situation awareness capability. Historical trajectory is taken into account to avoid repetitive wrong decision. Furthermore, it can be effectively applied to platform with low computing capability. As an example, a quad-rotor equipped with limited number of Time-of-Flight (TOF) rangefinders is adopted to validate the effectiveness and efficiency of this algorithm. Both software simulation and physical flight test have been conducted to demonstrate the capability of the MWF-APF method in complex scenarios.