Quanbo Ge

CV
h-index17
3papers
5citations
Novelty43%
AI Score43

3 Papers

7.8ROMay 26
Trust, Geometry, and Rules: A Credibility-Aware Reinforcement Learning Framework for Safe USV Navigation under Uncertainty

Yuhang Zhang, Shuqi Chai, Yukang Zhang et al.

Autonomous navigation of Unmanned Surface Vehicles (USVs) that is safe and compliant with the International Regulations for Preventing Collisions at Sea (COLREGs) remains a formidable challenge in dynamic maritime environments, particularly when perception systems exhibit miscalibrated uncertainty. Existing Reinforcement Learning (RL)-based methods often falter because state-estimation errors induce unreliable belief states that mislead the value function, while discrete traffic rules introduce discontinuity in the learning objective. To address these challenges, we propose a framework integrating credibility-aware learning, geometric safety shielding, and continuous rule-aware embedding. First, Credibility-Weighted Value Learning (CW-VL) introduces a dynamic trust factor derived from the discrepancy between filter-estimated covariance and empirical error statistics to modulate the critic's heteroscedastic loss, preventing policy overfitting to noisy samples. Second, the Covariance-Inflated Velocity Obstacle (CI-VO) maps position-estimation uncertainty into set-wise angular margins, forming a conservative geometric shield that overrides hazardous exploratory actions. Third, Risk-Aware COLREGs Duty Embedding relaxes binary encounter duties into continuous rule-aware signals, providing smooth sector-transition information and suppressing oscillation from sparse rule rewards. Simulated encounter studies demonstrate improved training robustness against perceptual inconsistency and superior collision avoidance and COLREGs compliance over baselines.

3.4SYApr 15
Cascaded TD3-PID Hybrid Controller for Quadrotor Trajectory Tracking in Wind Disturbance Environments

Yukang Zhang, Shuqi Chai, Yuhang Zhang et al.

This work presents a cascaded hybrid control framework for quadrotor trajectory tracking under nonlinear dynamics and external disturbances. In quadrotor systems, the altitude and attitude channels exhibit fast, structured dynamics that are well suited to reliable regulation, whereas horizontal-position control is more strongly affected by coupling effects, uncertainty, and disturbances, so that neither pure feedback control nor purely learning-based control alone is equally well suited to all channels. Accordingly, the proposed framework augments conventional proportional-integral-derivative (PID) stabilization for altitude and attitude control with an enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) agent incorporating a multi-Q-network structure, thereby improving horizontal-position control under severe disturbances. To further strengthen disturbance rejection in altitude and attitude control, a hybrid disturbance observer (HDOB) using low-pass and exponential moving average filtering is embedded in the control loops. The proposed TD3 enhancements are verified through ablation studies, and both numerical simulations and real-world flight tests on the quadrotor platform demonstrate that the proposed method achieves more accurate and robust trajectory tracking under wind disturbances than baseline approaches.

CVDec 10, 2024Code
Hallucination Elimination and Semantic Enhancement Framework for Vision-Language Models in Traffic Scenarios

Jiaqi Fan, Jianhua Wu, Hongqing Chu et al.

Large vision-language models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation tasks. However, these models occasionally generate hallucinatory texts, resulting in descriptions that seem reasonable but do not correspond to the image. This phenomenon can lead to wrong driving decisions of the autonomous driving system. To address this challenge, this paper proposes HCOENet, a plug-and-play chain-of-thought correction method designed to eliminate object hallucinations and generate enhanced descriptions for critical objects overlooked in the initial response. Specifically, HCOENet employs a cross-checking mechanism to filter entities and directly extracts critical objects from the given image, enriching the descriptive text. Experimental results on the POPE benchmark demonstrate that HCOENet improves the F1-score of the Mini-InternVL-4B and mPLUG-Owl3 models by 12.58% and 4.28%, respectively. Additionally, qualitative results using images collected in open campus scene further highlight the practical applicability of the proposed method. Compared with the GPT-4o model, HCOENet achieves comparable descriptive performance while significantly reducing costs. Finally, two novel semantic understanding datasets, CODA_desc and nuScenes_desc, are created for traffic scenarios to support future research. The codes and datasets are publicly available at https://github.com/fjq-tongji/HCOENet.