SY LGApr 3, 2025

MAD: A Magnitude And Direction Policy Parametrization for Stability Constrained Reinforcement Learning

Luca Furieri, Sucheth Shenoy, Danilo Saccani, Andrea Martin, Giancarlo Ferrari-Trecate

arXiv:2504.02565v25.15 citationsh-index: 14Has CodeCDC

Originality Incremental advance

AI Analysis

This addresses stability constraints in RL for control systems, offering a novel parametrization that is incremental in combining existing stability methods with state-feedback components.

The paper tackled the problem of ensuring closed-loop stability in reinforcement learning for nonlinear dynamical systems by introducing magnitude and direction (MAD) policies, which match the performance of standard neural network policies while guaranteeing stability by design.

We introduce magnitude and direction (MAD) policies, a policy parameterization for reinforcement learning (RL) that preserves Lp closed-loop stability for nonlinear dynamical systems. Despite their completeness in describing all stabilizing controllers, methods based on nonlinear Youla and system-level synthesis are significantly impacted by the difficulty of parametrizing Lp-stable operators. In contrast, MAD policies introduce explicit feedback on state-dependent features - a key element behind the success of reinforcement learning pipelines - without jeopardizing closed-loop stability. This is achieved by letting the magnitude of the control input be described by a disturbance-feedback Lp-stable operator, while selecting its direction based on state-dependent features through a universal function approximator. We further characterize the robust stability properties of MAD policies under model mismatch. Unlike existing disturbance-feedback policy parametrizations, MAD policies introduce state-feedback components compatible with model-free RL pipelines, ensuring closed-loop stability with no model information beyond assuming open-loop stability. Numerical experiments show that MAD policies trained with deep deterministic policy gradient (DDPG) methods generalize to unseen scenarios - matching the performance of standard neural network policies while guaranteeing closed-loop stability by design.

View on arXiv PDF Code

Similar