48.1ROMay 27Code
Provably Guaranteed Polytopic Uncertainty Quantification for SLAMGuangyang Zeng, Yulong Gao, Yuan Shen et al.
In safety-critical robotics applications, guaranteed and practical uncertainty quantification (UQ) in perception is vital. Many existing works either offer no formal containment guarantee, rely on restrictive modeling assumptions, or focus only on pose estimation rather than a complete SLAM pipeline. This paper presents provably guaranteed UQ algorithms for 3D-3D landmark-based SLAM. The algorithms consist of three basic UQ modules: forward UQ for mapping, backward UQ for pose tracking, and pose compound. Each module produces a certified uncertainty set; when the input uncertainty bounds are deterministic, the output sets inherit deterministic guarantees, i.e., they provably contain the true poses and landmarks. Specifically, we use polytopes to represent uncertainty sets, enabling tractable computations and a unified treatment of pose uncertainty. To enhance algorithms' practical usability, we incorporate conformal prediction to calibrate measurement uncertainty from data with prescribed probability. Simulations and experiments demonstrate that the proposed algorithms provide both strong theoretical guarantees and practical usability. The code is open-sourced at https://github.com/LIAS-CUHKSZ/Polytopic-SLAM-Uncertainty-Quantification.
SYJul 5, 2021
Computing Probabilistic Controlled Invariant SetsYulong Gao, Karl H. Johansson, Lihua Xie
This paper investigates stochastic invariance for control systems through probabilistic controlled invariant sets (PCISs). As a natural complement to robust controlled invariant sets~(RCISs), we propose finite- and infinite-horizon PCISs, and explore their relation to RICSs. We design iterative algorithms to compute the PCIS within a given set. For systems with discrete spaces, the computations of the finite- and infinite-horizon PCISs at each iteration are based on linear programming and mixed integer linear programming, respectively. The algorithms are computationally tractable and terminate in a finite number of steps. For systems with continuous spaces, we show how to discretize the spaces and prove the convergence of the approximation when computing the finite-horizon PCISs. In addition, it is shown that an infinite-horizon PCIS can be computed by the stochastic backward reachable set from the RCIS contained in it. These PCIS algorithms are applicable to practical control systems. Simulations are given to illustrate the effectiveness of the theoretical results for motion planning.
OCMar 23, 2023
Policy Evaluation in Distributional LQRZifan Wang, Yulong Gao, Siyi Wang et al.
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call \emph{distributional LQR}. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.
89.0SYApr 3
Data-Driven Synthesis of Probabilistic Controlled Invariant Sets for Linear MDPsKazumune Hashimoto, Shunki Kimura, Kazunobu Serizawa et al.
We study data-driven computation of probabilistic controlled invariant sets (PCIS) for safety-critical reinforcement learning under unknown dynamics. Assuming a linear MDP model, we use regularized least squares and self-normalized confidence bounds to construct a conservative estimate of the states from which the system can be kept inside a prescribed safe region over an \(N\)-step horizon, together with the corresponding set-valued safe action map. This construction is obtained through a backward recursion and can be interpreted as a conservative approximation of the \(N\)-step safety predecessor operator. When the associated conservative-inclusion event holds, a conservative fixed point of the approximate recursion can be certified as an \((N,ε)\)-PCIS with confidence at least \(η\). For continuous state spaces, we introduce a lattice abstraction and a Lipschitz-based discretization error bound to obtain a tractable approximation scheme. Finally, we use the resulting conservative fixed-point approximation as a runtime candidate PCIS in a practical shielding architecture with iterative updates, and illustrate the approach on a numerical experiment.
20.3LOMay 12
Ensuring Logic in the Fog: Sound POMDP Synthesis with LTL ObjectivesCan Zhou, Yulong Gao, Pian Yu
Synthesising autonomous agents that can navigate uncertain environments while adhering to complex temporal constraints remains a fundamental challenge. While Linear Temporal Logic (LTL) provides a rigorous language for specifying such tasks, the inherent undecidability of qualitatively verifying LTL satisfaction in partially observable Markov decision processes renders quantitative synthesis difficult, especially when designing reliable reward signals for approximate solvers. In this paper, we bridge this gap with a novel, sound reward-shaping mechanism that dynamically generates belief-dependent rewards grounded in certified LTL satisfaction. By integrating this mechanism into an enhanced Monte Carlo Planning framework, we empower agents to navigate the `fog' of partial observability with a search process focused on maximising verifiable success. Our experiments demonstrate that this approach not only thrives in scenarios where existing solvers fail but also maintains effectiveness and scalability across diverse benchmark domains.
IRDec 31, 2025
SEGB: Self-Evolved Generative Bidding with Local Autoregressive DiffusionYulong Gao, Wan Jiang, Mingzhe Cao et al.
In the realm of online advertising, automated bidding has become a pivotal tool, enabling advertisers to efficiently capture impression opportunities in real-time. Recently, generative auto-bidding has shown significant promise, offering innovative solutions for effective ad optimization. However, existing offline-trained generative policies lack the near-term foresight required for dynamic markets and usually depend on simulators or external experts for post-training improvement. To overcome these critical limitations, we propose Self-Evolved Generative Bidding (SEGB), a framework that plans proactively and refines itself entirely offline. SEGB first synthesizes plausible short-horizon future states to guide each bid, providing the agent with crucial, dynamic foresight. Crucially, it then performs value-guided policy refinement to iteratively discover superior strategies without any external intervention. This self-contained approach uniquely enables robust policy improvement from static data alone. Experiments on the AuctionNet benchmark and a large-scale A/B test validate our approach, demonstrating that SEGB significantly outperforms state-of-the-art baselines. In a large-scale online deployment, it delivered substantial business value, achieving a +10.19% increase in target cost, proving the effectiveness of our advanced planning and evolution paradigm.
CVNov 22, 2025
V2X-RECT: An Efficient V2X Trajectory Prediction Framework via Redundant Interaction Filtering and Tracking Error CorrectionXiangyan Kong, Xuecheng Wu, Xiongwei Zhao et al.
V2X prediction can alleviate perception incompleteness caused by limited line of sight through fusing trajectory data from infrastructure and vehicles, which is crucial to traffic safety and efficiency. However, in dense traffic scenarios, frequent identity switching of targets hinders cross-view association and fusion. Meanwhile, multi-source information tends to generate redundant interactions during the encoding stage, and traditional vehicle-centric encoding leads to large amounts of repetitive historical trajectory feature encoding, degrading real-time inference performance. To address these challenges, we propose V2X-RECT, a trajectory prediction framework designed for high-density environments. It enhances data association consistency, reduces redundant interactions, and reuses historical information to enable more efficient and accurate prediction. Specifically, we design a multi-source identity matching and correction module that leverages multi-view spatiotemporal relationships to achieve stable and consistent target association, mitigating the adverse effects of mismatches on trajectory encoding and cross-view feature fusion. Then we introduce traffic signal-guided interaction module, encoding trend of traffic light changes as features and exploiting their role in constraining spatiotemporal passage rights to accurately filter key interacting vehicles, while capturing the dynamic impact of signal changes on interaction patterns. Furthermore, a local spatiotemporal coordinate encoding enables reusable features of historical trajectories and map, supporting parallel decoding and significantly improving inference efficiency. Extensive experimental results across V2X-Seq and V2X-Traj datasets demonstrate that our V2X-RECT achieves significant improvements compared to SOTA methods, while also enhancing robustness and inference efficiency across diverse traffic densities.
LGNov 29, 2024
Risk-Averse Certification of Bayesian Neural NetworksXiyue Zhang, Zifan Wang, Yulong Gao et al.
In light of the inherently complex and dynamic nature of real-world environments, incorporating risk measures is crucial for the robustness evaluation of deep learning models. In this work, we propose a Risk-Averse Certification framework for Bayesian neural networks called RAC-BNN. Our method leverages sampling and optimisation to compute a sound approximation of the output set of a BNN, represented using a set of template polytopes. To enhance robustness evaluation, we integrate a coherent distortion risk measure--Conditional Value at Risk (CVaR)--into the certification framework, providing probabilistic guarantees based on empirical distributions obtained through sampling. We validate RAC-BNN on a range of regression and classification benchmarks and compare its performance with a state-of-the-art method. The results show that RAC-BNN effectively quantifies robustness under worst-performing risky scenarios, and achieves tighter certified bounds and higher efficiency in complex tasks.