SYApr 13, 2023
Multi-kernel Correntropy-based Orientation Estimation of IMUs: Gradient Descent MethodsShilei Li, Lijing Li, Dawei Shi et al.
This paper presents two computationally efficient algorithms for the orientation estimation of inertial measurement units (IMUs): the correntropy-based gradient descent (CGD) and the correntropy-based decoupled orientation estimation (CDOE). Traditional methods, such as gradient descent (GD) and decoupled orientation estimation (DOE), rely on the mean squared error (MSE) criterion, making them vulnerable to external acceleration and magnetic interference. To address this issue, we demonstrate that the multi-kernel correntropy loss (MKCL) is an optimal objective function for maximum likelihood estimation (MLE) when the noise follows a type of heavy-tailed distribution. In certain situations, the estimation error of the MKCL is bounded even in the presence of arbitrarily large outliers. By replacing the standard MSE cost function with MKCL, we develop the CGD and CDOE algorithms. We evaluate the effectiveness of our proposed methods by comparing them with existing algorithms in various situations. Experimental results indicate that our proposed methods (CGD and CDOE) outperform their conventional counterparts (GD and DOE), especially when faced with external acceleration and magnetic disturbances. Furthermore, the new algorithms demonstrate significantly lower computational complexity than Kalman filter-based approaches, making them suitable for applications with low-cost microprocessors.
ROFeb 4
AppleVLM: End-to-end Autonomous Driving with Advanced Perception and Planning-Enhanced Vision-Language ModelsYuxuan Han, Kunyuan Wu, Qianyi Shao et al.
End-to-end autonomous driving has emerged as a promising paradigm integrating perception, decision-making, and control within a unified learning framework. Recently, Vision-Language Models (VLMs) have gained significant attention for their potential to enhance the robustness and generalization of end-to-end driving models in diverse and unseen scenarios. However, existing VLM-based approaches still face challenges, including suboptimal lane perception, language understanding biases, and difficulties in handling corner cases. To address these issues, we propose AppleVLM, an advanced perception and planning-enhanced VLM model for robust end-to-end driving. AppleVLM introduces a novel vision encoder and a planning strategy encoder to improve perception and decision-making. Firstly, the vision encoder fuses spatial-temporal information from multi-view images across multiple timesteps using a deformable transformer mechanism, enhancing robustness to camera variations and facilitating scalable deployment across different vehicle platforms. Secondly, unlike traditional VLM-based approaches, AppleVLM introduces a dedicated planning modality that encodes explicit Bird's-Eye-View spatial information, mitigating language biases in navigation instructions. Finally, a VLM decoder fine-tuned by a hierarchical Chain-of-Thought integrates vision, language, and planning features to output robust driving waypoints. We evaluate AppleVLM in closed-loop experiments on two CARLA benchmarks, achieving state-of-the-art driving performance. Furthermore, we deploy AppleVLM on an AGV platform and successfully showcase real-world end-to-end autonomous driving in complex outdoor environments.
11.7ROMar 31
Interacting Multiple Model Proprioceptive Odometry for Legged RobotsWanlei Li, Zichang Chen, Shilei Li et al.
State estimation for legged robots remains challenging because legged odometry generally suffers from limited observability and therefore depends critically on measurement constraints to suppress drift. When exteroceptive sensors are unreliable or degraded, such constraints are mainly derived from proprioceptive measurements, particularly contact-related leg kinematics information. However, most existing proprioceptive odometry methods rely on an idealized point-contact assumption, which is often violated during real locomotion. Consequently, the effectiveness of proprioceptive constraints may be significantly reduced, resulting in degraded estimation accuracy. To address these limitations, we propose an interacting multiple model (IMM)-based proprioceptive odometry framework for legged robots. By incorporating multiple contact hypotheses within a unified probabilistic framework, the proposed method enables online mode switching and probabilistic fusion under varying contact conditions. Extensive simulations and real-world experiments demonstrate that the proposed method achieves superior pose estimation accuracy over state-of-the-art methods while maintaining comparable computational efficiency.
CVMay 20, 2025Code
4D-ROLLS: 4D Radar Occupancy Learning via LiDAR SupervisionRuihan Liu, Xiaoyi Wu, Xijun Chen et al.
A comprehensive understanding of 3D scenes is essential for autonomous vehicles (AVs), and among various perception tasks, occupancy estimation plays a central role by providing a general representation of drivable and occupied space. However, most existing occupancy estimation methods rely on LiDAR or cameras, which perform poorly in degraded environments such as smoke, rain, snow, and fog. In this paper, we propose 4D-ROLLS, the first weakly supervised occupancy estimation method for 4D radar using the LiDAR point cloud as the supervisory signal. Specifically, we introduce a method for generating pseudo-LiDAR labels, including occupancy queries and LiDAR height maps, as multi-stage supervision to train the 4D radar occupancy estimation model. Then the model is aligned with the occupancy map produced by LiDAR, fine-tuning its accuracy in occupancy estimation. Extensive comparative experiments validate the exceptional performance of 4D-ROLLS. Its robustness in degraded environments and effectiveness in cross-dataset training are qualitatively demonstrated. The model is also seamlessly transferred to downstream tasks BEV segmentation and point cloud occupancy prediction, highlighting its potential for broader applications. The lightweight network enables 4D-ROLLS model to achieve fast inference speeds at about 30 Hz on a 4060 GPU. The code of 4D-ROLLS will be made available at https://github.com/CLASS-Lab/4D-ROLLS.
56.1ROMay 4
LiDAR Teach, Radar Repeat: Robust Cross-Modal Navigation in Degenerate and Varying EnvironmentsRenxiang Xiao, Yichen Chen, Yuanfan Zhang et al.
Long-term autonomy requires robust navigation in environments subject to dynamic and static changes, as well as adverse weather conditions. Teach-and-Repeat (T\&R) navigation offers a reliable and cost-effective solution by avoiding the need for consistent global mapping; however, existing T\&R systems lack a systematic solution to tackle various environmental variations such as weather degradation, ephemeral dynamics, and structural changes. This work proposes LTR$^2$, the first cross-modal, cross-platform LiDAR-Teach-and-Radar-Repeat system that systematically addresses these challenges. LTR$^2$ leverages LiDAR during the teaching phase to capture precise structural information under normal conditions and utilizes 4D millimeter-wave radar during the repeating phase for robust operation under environmental degradations. To align sparse and noisy forward-looking 4D radar with dense and accurate omnidirectional 3D LiDAR data, we introduce a Cross-Modal Registration (CMR) network that jointly exploits Doppler-based motion priors and the physical laws governing LiDAR intensity and radar power density. Furthermore, we propose an adaptive fine-tuning strategy that incrementally updates the CMR network based on localization errors, enabling long-term adaptability to static environmental changes without ground-truth labels. We demonstrate that the proposed CMR network achieves state-of-the-art cross-modal registration performance on the open-access dataset. Then we validate LTR$^2$ across three robot platforms over a large-scale, long-term deployment (40+ km over 6 months), including challenging conditions such as nighttime smoke. Experimental results and ablation studies demonstrate centimeter-level accuracy and strong robustness against diverse environmental disturbances, significantly outperforming existing approaches.
SYOct 1, 2021
Error-free approximation of explicit linear MPC through lattice piecewise affine expressionJun Xu, Yunjiang Lou, Bart De Schutter et al.
In this paper, the disjunctive and conjunctive lattice piecewise affine (PWA) approximations of explicit linear model predictive control (MPC) are proposed. The training data are generated uniformly in the domain of interest, consisting of the state samples and corresponding affine control laws, based on which the lattice PWA approximations are constructed. Re-sampling of data is also proposed to guarantee that the lattice PWA approximations are identical to explicit MPC control law in the unique order (UO) regions containing the sample points as interior points. Additionally, under mild assumptions, the equivalence of the two lattice PWA approximations guarantees that the approximations are error-free in the domain of interest. The algorithms for deriving statistically error-free approximation to the explicit linear MPC are proposed and the complexity of the entire procedure is analyzed, which is polynomial with respect to the number of samples. The performance of the proposed approximation strategy is tested through two simulation examples, and the result shows that with a moderate number of sample points, we can construct lattice PWA approximations that are equivalent to optimal control law of the explicit linear MPC.