LGApr 22, 2024

Lipschitz-Regularized Critics Lead to Policy Robustness Against Transition Dynamics Uncertainty

Xulin Chen, Ruipeng Liu, Zhenyu Gan, Garrett E. Katz

arXiv:2404.13879v3h-index: 2

Originality Incremental advance

AI Analysis

This addresses robustness for RL policies in real-world deployment, but it is incremental as it builds on existing methods like PPO and Lipschitz regularization.

The paper tackled the problem of policy robustness against transition dynamics uncertainty in reinforcement learning by proposing PPO-PGDLC, which integrates a Lipschitz-regularized critic with adversarial training, resulting in better performance and smoother actions in control and robotic tasks compared to baselines.

Uncertainties in transition dynamics pose a critical challenge in reinforcement learning (RL), often resulting in performance degradation of trained policies when deployed on hardware. Many robust RL approaches follow two strategies: enforcing smoothness in actor or actor-critic modules with Lipschitz regularization, or learning robust Bellman operators. However, the first strategy does not investigate the impact of critic-only Lipschitz regularization on policy robustness, while the second lacks comprehensive validation in real-world scenarios. Building on this gap and prior work, we propose PPO-PGDLC, an algorithm based on Proximal Policy Optimization (PPO) that integrates Projected Gradient Descent (PGD) with a Lipschitz-regularized critic (LC). The PGD component calculates the adversarial state within an uncertainty set to approximate the robust Bellman operator, and the Lipschitz-regularized critic further improves the smoothness of learned policies. Experimental results on two classic control tasks and one real-world robotic locomotion task demonstrates that, compared to several baseline algorithms, PPO-PGDLC achieves better performance and predicts smoother actions under environmental perturbations.

View on arXiv PDF

Similar