LG AI OC PR MLNov 28, 2024

Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints

Pekka Malo, Lauri Viitasaari, Antti Suominen, Eeva Vilkkumaa, Olli Tahvonen

arXiv:2411.19193v24.61 citationsh-index: 44

Originality Incremental advance

AI Analysis

This work addresses safety-critical applications like autonomous systems and finance, offering theoretical insights for safe RL in high-dimensional settings, though it appears incremental as it builds on existing regularization and mean-field methods.

The paper tackles reinforcement learning with safety constraints in infinite-horizon decision processes by proposing a doubly-regularized framework that combines reward and parameter regularization, resulting in exponential convergence guarantees under sufficient regularization.

This paper examines reinforcement learning (RL) in infinite-horizon decision processes with almost-sure safety constraints, crucial for applications like autonomous systems, finance, and resource management. We propose a doubly-regularized RL framework combining reward and parameter regularization to address safety constraints in continuous state-action spaces. The problem is formulated as a convex regularized objective with parametrized policies in the mean-field regime. Leveraging mean-field theory and Wasserstein gradient flows, policies are modeled on an infinite-dimensional statistical manifold, with updates governed by parameter distribution gradient flows. Key contributions include solvability conditions for safety-constrained problems, smooth bounded approximations for gradient flows, and exponential convergence guarantees under sufficient regularization. General regularization conditions, including entropy regularization, support practical particle method implementations. This framework provides robust theoretical insights and guarantees for safe RL in complex, high-dimensional settings.

View on arXiv PDF

Similar