OC LGDec 21, 2024

A learning-based approach to stochastic optimal control under reach-avoid constraint

arXiv:2412.16561v35.62 citationsh-index: 5HSCC

Originality Incremental advance

AI Analysis

This addresses the challenge of computational complexity in constrained stochastic control for applications like robotics or autonomous systems, but it is incremental as it builds on existing state-augmentation techniques.

The paper tackles the problem of controlling stochastic systems under reach-avoid constraints, where trajectories must stay safe and reach a target within a finite time, by developing a model-free approach that uses state-augmentation and a log-barrier policy gradient method, proving convergence to optimal parameters with high probability of constraint satisfaction.

We develop a model-free approach to optimally control stochastic, Markovian systems subject to a reach-avoid constraint. Specifically, the state trajectory must remain within a safe set while reaching a target set within a finite time horizon. Due to the time-dependent nature of these constraints, we show that, in general, the optimal policy for this constrained stochastic control problem is non-Markovian, which increases the computational complexity. To address this challenge, we apply the state-augmentation technique from arXiv:2402.19360, reformulating the problem as a constrained Markov decision process (CMDP) on an extended state space. This transformation allows us to search for a Markovian policy, avoiding the complexity of non-Markovian policies. To learn the optimal policy without a system model, and using only trajectory data, we develop a log-barrier policy gradient approach. We prove that under suitable assumptions, the policy parameters converge to the optimal parameters, while ensuring that the system trajectories satisfy the stochastic reach-avoid constraint with high probability.

View on arXiv PDF

Similar