38.6ROJun 1
Dynamics Are Learned, Not Told: Semi-Supervised Discovery of Latent Dynamics Geometries For Zero-Shot Policy AdaptationZhiming Xu, Weitao Zhou, Xianghui Pan et al.
Real-world dynamics shifts pose a critical challenge for reinforcement learning in robotics, as policies tightly coupled to nominal environments often fail catastrophically when physical conditions change. Most existing methods rely on encoding explicitly identified physical parameters into a latent context, a parameter-centric paradigm that depends on pre-specified axes of variation and becomes brittle under unmodeled or compound dynamics changes. We revisit dynamics adaptation from an outcome-centric perspective: rather than telling policies what the dynamics are, we enable them to learn how dynamics affect interaction outcomes. Theoretically, this is grounded in a monotonic relationship between target-domain regret and the Lipschitz constant of a trajectory dynamics encoder. Practically, this constant can be upper-bounded through contrastive learning, yielding a smooth, task-relevant latent topology without privileged dynamics information. On MuJoCo benchmarks, our method consistently outperforms parameter-centric baselines under severe dynamics shifts, including unmodeled and time-varying parameters, while also improving in-distribution stability and latent interpretability. Overall, these results validate that controlling latent geometry is a principled mechanism for robust adaptation.
AIJul 2, 2022
Long-Tail Prediction Uncertainty Aware Trajectory Planning for Self-driving VehiclesWeitao Zhou, Zhong Cao, Yunkang Xu et al.
A typical trajectory planner of autonomous driving commonly relies on predicting the future behavior of surrounding obstacles. Recently, deep learning technology has been widely adopted to design prediction models due to their impressive performance. However, such models may fail in the "long-tail" driving cases where the training data is sparse or unavailable, leading to planner failures. To this end, this work proposes a trajectory planner to consider the prediction model uncertainty arising from insufficient data for safer performance. Firstly, an ensemble network structure estimates the prediction model's uncertainty due to insufficient training data. Then a trajectory planner is designed to consider the worst-case arising from prediction uncertainty. The results show that the proposed method can improve the safety of trajectory planning under the prediction uncertainty caused by insufficient data. At the same time, with sufficient data, the framework will not lead to overly conservative results. This technology helps to improve the safety and reliability of autonomous vehicles under the long-tail data distribution of the real world.
RODec 31, 2025
LSRE: Latent Semantic Rule Encoding for Real-Time Semantic Risk Detection in Autonomous DrivingQian Cheng, Weitao Zhou, Cheng Jing et al.
Real-world autonomous driving must adhere to complex human social rules that extend beyond legally codified traffic regulations. Many of these semantic constraints, such as yielding to emergency vehicles, complying with traffic officers' gestures, or stopping for school buses, are intuitive for humans yet difficult to encode explicitly. Although large vision-language models (VLMs) can interpret such semantics, their inference cost makes them impractical for real-time deployment. This work proposes LSRE, a Latent Semantic Rule Encoding framework that converts sparsely sampled VLM judgments into decision boundaries within the latent space of a recurrent world model. By encoding language-defined safety semantics into a lightweight latent classifier, LSRE enables real-time semantic risk assessment at 10 Hz without per-frame VLM queries. Experiments on six semantic-failure scenarios in CARLA demonstrate that LSRE attains semantic risk detection accuracy comparable to a large VLM baseline, while providing substantially earlier hazard anticipation and maintaining low computational latency. LSRE further generalizes to rarely seen semantic-similar test cases, indicating that language-guided latent classification offers an effective and deployable mechanism for semantic safety monitoring in autonomous driving.
ROFeb 28, 2025
Dynamically Local-Enhancement Planner for Large-Scale Autonomous DrivingNanshan Deng, Weitao Zhou, Bo Zhang et al.
Current autonomous vehicles operate primarily within limited regions, but there is increasing demand for broader applications. However, as models scale, their limited capacity becomes a significant challenge for adapting to novel scenarios. It is increasingly difficult to improve models for new situations using a single monolithic model. To address this issue, we introduce the concept of dynamically enhancing a basic driving planner with local driving data, without permanently modifying the planner itself. This approach, termed the Dynamically Local-Enhancement (DLE) Planner, aims to improve the scalability of autonomous driving systems without significantly expanding the planner's size. Our approach introduces a position-varying Markov Decision Process formulation coupled with a graph neural network that extracts region-specific driving features from local observation data. The learned features describe the local behavior of the surrounding objects, which is then leveraged to enhance a basic reinforcement learning-based policy. We evaluated our approach in multiple scenarios and compared it with a one-for-all driving model. The results show that our method outperforms the baseline policy in both safety (collision rate) and average reward, while maintaining a lighter scale. This approach has the potential to benefit large-scale autonomous vehicles without the need for largely expanding on-device driving models.
ROMay 12, 2023
Dynamically Conservative Self-Driving Planner for Long-Tail CasesWeitao Zhou, Zhong Cao, Nanshan Deng et al.
Self-driving vehicles (SDVs) are becoming reality but still suffer from "long-tail" challenges during natural driving: the SDVs will continually encounter rare, safety-critical cases that may not be included in the dataset they were trained. Some safety-assurance planners solve this problem by being conservative in all possible cases, which may significantly affect driving mobility. To this end, this work proposes a method to automatically adjust the conservative level according to each case's "long-tail" rate, named dynamically conservative planner (DCP). We first define the "long-tail" rate as an SDV's confidence to pass a driving case. The rate indicates the probability of safe-critical events and is estimated using the statistics bootstrapped method with historical data. Then, a reinforcement learning-based planner is designed to contain candidate policies with different conservative levels. The final policy is optimized based on the estimated "long-tail" rate. In this way, the DCP is designed to automatically adjust to be more conservative in low-confidence "long-tail" cases while keeping efficient otherwise. The DCP is evaluated in the CARLA simulator using driving cases with "long-tail" distributed training data. The results show that the DCP can accurately estimate the "long-tail" rate to identify potential risks. Based on the rate, the DCP automatically avoids potential collisions in "long-tail" cases using conservative decisions while not affecting the average velocity in other typical cases. Thus, the DCP is safer and more efficient than the baselines with fixed conservative levels, e.g., an always conservative planner. This work provides a technique to guarantee SDV's performance in unexpected driving cases without resorting to a global conservative setting, which contributes to solving the "long-tail" problem practically.
AIMay 12, 2023
Identify, Estimate and Bound the Uncertainty of Reinforcement Learning for Autonomous DrivingWeitao Zhou, Zhong Cao, Nanshan Deng et al.
Deep reinforcement learning (DRL) has emerged as a promising approach for developing more intelligent autonomous vehicles (AVs). A typical DRL application on AVs is to train a neural network-based driving policy. However, the black-box nature of neural networks can result in unpredictable decision failures, making such AVs unreliable. To this end, this work proposes a method to identify and protect unreliable decisions of a DRL driving policy. The basic idea is to estimate and constrain the policy's performance uncertainty, which quantifies potential performance drop due to insufficient training data or network fitting errors. By constraining the uncertainty, the DRL model's performance is always greater than that of a baseline policy. The uncertainty caused by insufficient data is estimated by the bootstrapped method. Then, the uncertainty caused by the network fitting error is estimated using an ensemble network. Finally, a baseline policy is added as the performance lower bound to avoid potential decision failures. The overall framework is called uncertainty-bound reinforcement learning (UBRL). The proposed UBRL is evaluated on DRL policies with different amounts of training data, taking an unprotected left-turn driving case as an example. The result shows that the UBRL method can identify potentially unreliable decisions of DRL policy. The UBRL guarantees to outperform baseline policy even when the DRL policy is not well-trained and has high uncertainty. Meanwhile, the performance of UBRL improves with more training data. Such a method is valuable for the DRL application on real-road driving and provides a metric to evaluate a DRL policy.