88.0SYMay 8Code
Online Adaptive Probabilistic Safety Certificate with Language GuidanceZhuoyuan Wang, Xiyu Deng, Hikaru Hoshino et al. · cmu
Achieving long-term safety in uncertain/extreme environments while accounting for human preferences remains a fundamental challenge for autonomous systems. Existing methods often trade off long-term guarantees for fast real-time control and cannot adapt to variability in human preferences or risk tolerance. To address these limitations, we propose a language-guided adaptive probabilistic safety certificate (PSC) framework that guarantees long-term safety for stochastic systems under environmental uncertainty while accommodating diverse human preferences. The proposed framework integrates natural-language inputs from users and Bayesian estimators of the environment into adaptive safety certificates that explicitly account for user preferences, system dynamics, and quantified uncertainties. Our key technical innovation leverages probabilistic invariance--a generalization of forward invariance to a probability space--to obtain myopic safety conditions with long-term safety guarantees. We validate the framework through numerical simulations of autonomous lane-keeping with human-in-the-loop guidance under uncertain and extreme road conditions, demonstrating enhanced safety-performance trade-offs, adaptability to changing environments, and personalization to different user preferences. Code is available at https://github.com/hoshino06/adaptive_lane_keeping.
6.7SYMay 13
Optimal design of solar-battery hybrid resources considering multi-market participation under weather and price uncertaintyHikaru Hoshino, Taiyo Mantani, Eiko Furutani
The rapid growth of variable renewable energy has increased the need for flexible and efficiently coordinated energy resources. In this context, hybrid resources that combine renewable generation and battery storage within a single market-participating entity have attracted growing attention. Such hybrid resources can have multiple revenue streams, while allocating limited power and energy capacity across multiple electricity markets including energy and ancillary services. This multi-market coordination increases operational complexity and complicates profitability assessment, making optimal system sizing a challenging design problem. In addition, uncertainty in renewable generation and market prices makes it difficult for conventional optimization approaches to determine system designs that remain effective under stochastic operating conditions. To address these challenges, this paper proposes a deep reinforcement learning-based co-optimization framework for hybrid solar-battery resources. The framework embeds system design variables directly into the policy learning process, enabling joint optimization of hybrid system sizing and coordinated multi-market bidding strategies within a unified stochastic formulation. Case studies using historical renewable generation and market data demonstrate the effectiveness of the proposed framework in identifying economically rational hybrid system design considering multi-market operation.
SYMar 25, 2024
Physics-informed RL for Maximal Safety Probability EstimationHikaru Hoshino, Yorie Nakahira · cmu
Accurate risk quantification and reachability analysis are crucial for safe control and learning, but sampling from rare events, risky states, or long-term trajectories can be prohibitively costly. Motivated by this, we study how to estimate the long-term safety probability of maximally safe actions without sufficient coverage of samples from risky states and long-term trajectories. The use of maximal safety probability in control and learning is expected to avoid conservative behaviors due to over-approximation of risk. Here, we first show that long-term safety probability, which is multiplicative in time, can be converted into additive costs and be solved using standard reinforcement learning methods. We then derive this probability as solutions of partial differential equations (PDEs) and propose Physics-Informed Reinforcement Learning (PIRL) algorithm. The proposed method can learn using sparse rewards because the physics constraints help propagate risk information through neighbors. This suggests that, for the purpose of extracting more information for efficient learning, physics constraints can serve as an alternative to reward shaping. The proposed method can also estimate long-term risk using short-term samples and deduce the risk of unsampled states. This feature is in stark contrast with the unconstrained deep RL that demands sufficient data coverage. These merits of the proposed method are demonstrated in numerical simulation.