ROJun 23, 2025Code
Radar and Event Camera Fusion for Agile Robot Ego-Motion EstimationYang Lyu, Zhenghao Zou, Yanfeng Li et al.
Achieving reliable ego motion estimation for agile robots, e.g., aerobatic aircraft, remains challenging because most robot sensors fail to respond timely and clearly to highly dynamic robot motions, often resulting in measurement blurring, distortion, and delays. In this paper, we propose an IMU-free and feature-association-free framework to achieve aggressive ego-motion velocity estimation of a robot platform in highly dynamic scenarios by combining two types of exteroceptive sensors, an event camera and a millimeter wave radar, First, we used instantaneous raw events and Doppler measurements to derive rotational and translational velocities directly. Without a sophisticated association process between measurement frames, the proposed method is more robust in texture-less and structureless environments and is more computationally efficient for edge computing devices. Then, in the back-end, we propose a continuous-time state-space model to fuse the hybrid time-based and event-based measurements to estimate the ego-motion velocity in a fixed-lagged smoother fashion. In the end, we validate our velometer framework extensively in self-collected experiment datasets. The results indicate that our IMU-free and association-free ego motion estimation framework can achieve reliable and efficient velocity output in challenging environments. The source code, illustrative video and dataset are available at https://github.com/ZzhYgwh/TwistEstimator.
RONov 27, 2024Code
Monocular Obstacle Avoidance Based on Inverse PPO for Fixed-wing UAVsHaochen Chai, Meimei Su, Yang Lyu et al.
Fixed-wing Unmanned Aerial Vehicles (UAVs) are one of the most commonly used platforms for the burgeoning Low-altitude Economy (LAE) and Urban Air Mobility (UAM), due to their long endurance and high-speed capabilities. Classical obstacle avoidance systems, which rely on prior maps or sophisticated sensors, face limitations in unknown low-altitude environments and small UAV platforms. In response, this paper proposes a lightweight deep reinforcement learning (DRL) based UAV collision avoidance system that enables a fixed-wing UAV to avoid unknown obstacles at cruise speed over 30m/s, with only onboard visual sensors. The proposed system employs a single-frame image depth inference module with a streamlined network architecture to ensure real-time obstacle detection, optimized for edge computing devices. After that, a reinforcement learning controller with a novel reward function is designed to balance the target approach and flight trajectory smoothness, satisfying the specific dynamic constraints and stability requirements of a fixed-wing UAV platform. An adaptive entropy adjustment mechanism is introduced to mitigate the exploration-exploitation trade-off inherent in DRL, improving training convergence and obstacle avoidance success rates. Extensive software-in-the-loop and hardware-in-the-loop experiments demonstrate that the proposed framework outperforms other methods in obstacle avoidance efficiency and flight trajectory smoothness and confirm the feasibility of implementing the algorithm on edge devices. The source code is publicly available at \url{https://github.com/ch9397/FixedWing-MonoPPO}.
MLMay 5, 2025
Resolving Memorization in Empirical Diffusion Model for Manifold Data in High-Dimensional SpacesYang Lyu, Tan Minh Nguyen, Yuchun Qian et al.
Diffusion models are popular tools for generating new data samples, using a forward process that adds noise to data and a reverse process to denoise and produce samples. However, when the data distribution consists of n points, empirical diffusion models tend to reproduce existing data points, a phenomenon known as the memorization effect. Current literature often addresses this with complex machine learning techniques. This work shows that the memorization issue can be solved simply by applying an inertia update at the end of the empirical diffusion simulation. Our inertial diffusion model requires only the empirical score function and no additional training. We demonstrate that the distribution of samples from this model approximates the true data distribution on a $C^2$ manifold of dimension $d$, within a Wasserstein-1 distance of order $O(n^{-\frac{2}{d+4}})$. This bound significantly shrinks the Wasserstein distance between the population and empirical distributions, confirming that the inertial diffusion model produces new and diverse samples. Remarkably, this estimate is independent of the ambient space dimension, as no further training is needed. Our analysis shows that the inertial diffusion samples resemble Gaussian kernel density estimations on the manifold, revealing a novel connection between diffusion models and manifold learning.
AISep 30, 2025
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research BenchmarkMinhui Zhu, Minyang Tian, Xiaocheng Yang et al.
While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integrated Thinking - Physics Test, pronounced "critical point"), the first benchmark designed to test LLMs on unpublished, research-level reasoning tasks that broadly covers modern physics research areas, including condensed matter, quantum physics, atomic, molecular & optical physics, astrophysics, high energy physics, mathematical physics, statistical physics, nuclear physics, nonlinear dynamics, fluid dynamics and biophysics. CritPt consists of 71 composite research challenges designed to simulate full-scale research projects at the entry level, which are also decomposed to 190 simpler checkpoint tasks for more fine-grained insights. All problems are newly created by 50+ active physics researchers based on their own research. Every problem is hand-curated to admit a guess-resistant and machine-verifiable answer and is evaluated by an automated grading pipeline heavily customized for advanced physics-specific output formats. We find that while current state-of-the-art LLMs show early promise on isolated checkpoints, they remain far from being able to reliably solve full research-scale challenges: the best average accuracy among base models is only 5.7%, achieved by GPT-5 (high), moderately rising to around 10% when equipped with coding tools. Through the realistic yet standardized evaluation offered by CritPt, we highlight a large disconnect between current model capabilities and realistic physics research demands, offering a foundation to guide the development of scientifically grounded AI tools.
CVJul 15, 2025
COLI: A Hierarchical Efficient Compressor for Large ImagesHaoran Wang, Hanyu Pei, Yang Lyu et al.
The escalating adoption of high-resolution, large-field-of-view imagery amplifies the need for efficient compression methodologies. Conventional techniques frequently fail to preserve critical image details, while data-driven approaches exhibit limited generalizability. Implicit Neural Representations (INRs) present a promising alternative by learning continuous mappings from spatial coordinates to pixel intensities for individual images, thereby storing network weights rather than raw pixels and avoiding the generalization problem. However, INR-based compression of large images faces challenges including slow compression speed and suboptimal compression ratios. To address these limitations, we introduce COLI (Compressor for Large Images), a novel framework leveraging Neural Representations for Videos (NeRV). First, recognizing that INR-based compression constitutes a training process, we accelerate its convergence through a pretraining-finetuning paradigm, mixed-precision training, and reformulation of the sequential loss into a parallelizable objective. Second, capitalizing on INRs' transformation of image storage constraints into weight storage, we implement Hyper-Compression, a novel post-training technique to substantially enhance compression ratios while maintaining minimal output distortion. Evaluations across two medical imaging datasets demonstrate that COLI consistently achieves competitive or superior PSNR and SSIM metrics at significantly reduced bits per pixel (bpp), while accelerating NeRV training by up to 4 times.
ROFeb 1, 2022
NTU VIRAL: A Visual-Inertial-Ranging-Lidar Dataset, From an Aerial Vehicle ViewpointThien-Minh Nguyen, Shenghai Yuan, Muqing Cao et al.
In recent years, autonomous robots have become ubiquitous in research and daily life. Among many factors, public datasets play an important role in the progress of this field, as they waive the tall order of initial investment in hardware and manpower. However, for research on autonomous aerial systems, there appears to be a relative lack of public datasets on par with those used for autonomous driving and ground robots. Thus, to fill in this gap, we conduct a data collection exercise on an aerial platform equipped with an extensive and unique set of sensors: two 3D lidars, two hardware-synchronized global-shutter cameras, multiple Inertial Measurement Units (IMUs), and especially, multiple Ultra-wideband (UWB) ranging units. The comprehensive sensor suite resembles that of an autonomous driving car, but features distinct and challenging characteristics of aerial operations. We record multiple datasets in several challenging indoor and outdoor conditions. Calibration results and ground truth from a high-accuracy laser tracker are also included in each package. All resources can be accessed via our webpage https://ntu-aris.github.io/ntu_viral_dataset.
ROApr 24, 2021
MILIOM: Tightly Coupled Multi-Input Lidar-Inertia Odometry and MappingThien-Minh Nguyen, Shenghai Yuan, Muqing Cao et al.
In this letter we investigate a tightly coupled Lidar-Inertia Odometry and Mapping (LIOM) scheme, with the capability to incorporate multiple lidars with complementary field of view (FOV). In essence, we devise a time-synchronized scheme to combine extracted features from separate lidars into a single pointcloud, which is then used to construct a local map and compute the feature-map matching (FMM) coefficients. These coefficients, along with the IMU preinteration observations, are then used to construct a factor graph that will be optimized to produce an estimate of the sliding window trajectory. We also propose a key frame-based map management strategy to marginalize certain poses and pointclouds in the sliding window to grow a global map, which is used to assemble the local map in the later stage. The use of multiple lidars with complementary FOV and the global map ensures that our estimate has low drift and can sustain good localization in situations where single lidar use gives poor result, or even fails to work. Multi-thread computation implementations are also adopted to fractionally cut down the computation time and ensure real-time performance. We demonstrate the efficacy of our system via a series of experiments on public datasets collected from an aerial vehicle.
ROFeb 2, 2021
Vision Based Autonomous UAV Plane Estimation And Following for Building InspectionYang Lyu, Muqing Cao, Shenghai Yuan et al.
Unmanned Aerial Vehicle (UAV) has already demonstrated its potential in many civilian applications, and the façade inspection is among the most promising ones. In this paper, we focus on enabling the autonomous perception and control of a small UAV for a façade inspection task. Specifically, we consider the perception as a planar object pose estimation problem by simplifying the building structure as concatenation of planes, and the control as an optimal reference tracking control problem. First, a vision based adaptive observer is proposed which can realize stable plane pose estimation under very mild observation conditions. Second, a model predictive controller is designed to achieve stable tracking and smooth transition in a multi-plane scenario, while the persistent excitation (PE) condition of the observer and the maneuver constraints of the UAV are satisfied. The proposed autonomous plane pose estimation and plane tracking methods are tested in both simulation and practical building fasçade inspection scenarios, which demonstrate their effectiveness and practicability.
RODec 28, 2020
SPINS: Structure Priors aided Inertial Navigation SystemYang Lyu, Thien-Minh Nguyen, Liu Liu et al.
Although Simultaneous Localization and Mapping (SLAM) has been an active research topic for decades, current state-of-the-art methods still suffer from instability or inaccuracy due to feature insufficiency or its inherent estimation drift, in many civilian environments. To resolve these issues, we propose a navigation system combing the SLAM and prior-map-based localization. Specifically, we consider additional integration of line and plane features, which are ubiquitous and more structurally salient in civilian environments, into the SLAM to ensure feature sufficiency and localization robustness. More importantly, we incorporate general prior map information into the SLAM to restrain its drift and improve the accuracy. To avoid rigorous association between prior information and local observations, we parameterize the prior knowledge as low dimensional structural priors defined as relative distances/angles between different geometric primitives. The localization is formulated as a graph-based optimization problem that contains sliding-window-based variables and factors, including IMU, heterogeneous features, and structure priors. We also derive the analytical expressions of Jacobians of different factors to avoid the automatic differentiation overhead. To further alleviate the computation burden of incorporating structural prior factors, a selection mechanism is adopted based on the so-called information gain to incorporate only the most effective structure priors in the graph optimization. Finally, the proposed framework is extensively tested on synthetic data, public datasets, and, more importantly, on the real UAV flight data obtained from a building inspection task. The results show that the proposed scheme can effectively improve the accuracy and robustness of localization for autonomous robots in civilian applications.
ROOct 25, 2020
LIRO: Tightly Coupled Lidar-Inertia-Ranging OdometryThien-Minh Nguyen, Muqing Cao, Shenghai Yuan et al.
In recent years, thanks to the continuously reduced cost and weight of 3D Lidar, the applications of this type of sensor in robotics community have become increasingly popular. Despite many progresses, estimation drift and tracking loss are still prevalent concerns associated with these systems. However, in theory these issues can be resolved with the use of some observations to fixed landmarks in the environments. This motivates us to investigate a tightly coupled sensor fusion scheme of Ultra-Wideband (UWB) range measurements with Lidar and inertia measurements. First, data from IMU, Lidar and UWB are associated with the robot's states on a sliding windows based on their timestamps. Then, we construct a cost function comprising of factors from UWB, Lidar and IMU preintegration measurements. Finally an optimization process is carried out to estimate the robot's position and orientation. Via some real world experiments, we show that the method can effectively resolve the drift issue, while only requiring two or three anchors deployed in the environment.
ROOct 23, 2020
VIRAL-Fusion: A Visual-Inertial-Ranging-Lidar Sensor Fusion ApproachThien-Minh Nguyen, Shenghai Yuan, Muqing Cao et al.
In recent years, Onboard Self Localization (OSL) methods based on cameras or Lidar have achieved many significant progresses. However, some issues such as estimation drift and feature-dependence still remain inherent limitations. On the other hand, infrastructure-based methods can generally overcome these issues, but at the expense of some installation cost. This poses an interesting problem of how to effectively combine these methods, so as to achieve localization with long-term consistency as well as flexibility compared to any single method. To this end, we propose a comprehensive optimization-based estimator for 15-dimensional state of an Unmanned Aerial Vehicle (UAV), fusing data from an extensive set of sensors: inertial measurement units (IMUs), Ultra-Wideband (UWB) ranging sensors, and multiple onboard Visual-Inertial and Lidar odometry subsystems. In essence, a sliding window is used to formulate a sequence of robot poses, where relative rotational and translational constraints between these poses are observed in the IMU preintegration and OSL observations, while orientation and position are coupled in body-offset UWB range observations. An optimization-based approach is developed to estimate the trajectory of the robot in this sliding window. We evaluate the performance of the proposed scheme in multiple scenarios, including experiments on public datasets, high-fidelity graphical-physical simulator, and field-collected data from UAV flight tests. The result demonstrates that our integrated localization method can effectively resolve the drift issue, while incurring minimal installation requirements.
ROJun 1, 2018
Multi-vehicle Flocking Control with Deep Deterministic Policy Gradient MethodYang Lyu, Quan Pan, Jinwen Hu et al.
Flocking control has been studied extensively along with the wide application of multi-vehicle systems. In this paper the Multi-vehicles System (MVS) flocking control with collision avoidance and communication preserving is considered based on the deep reinforcement learning framework. Specifically the deep deterministic policy gradient (DDPG) with centralized training and distributed execution process is implemented to obtain the flocking control policy. First, to avoid the dynamically changed observation of state, a three layers tensor based representation of the observation is used so that the state remains constant although the observation dimension is changing. A reward function is designed to guide the way-points tracking, collision avoidance and communication preserving. The reward function is augmented by introducing the local reward function of neighbors. Finally, a centralized training process which trains the shared policy based on common training set among all agents. The proposed method is tested under simulated scenarios with different setup.