LG SYMay 19, 2024

On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks

Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester

arXiv:2405.11432v310.412 citationsh-index: 34Has Code

Originality Incremental advance

AI Analysis

This work addresses robustness issues in reinforcement learning for applications like robotics and gaming, but it is incremental as it builds on existing Lipschitz methods.

The paper tackled robust policy networks in deep reinforcement learning by analyzing Lipschitz-bounded parameterizations, finding that they improve robustness to disturbances and attacks compared to unconstrained policies, with specific layers like Sandwich achieving better trade-offs than spectral normalization.

This paper presents a study of robust policy networks in deep reinforcement learning. We investigate the benefits of policy parameterizations that naturally satisfy constraints on their Lipschitz bound, analyzing their empirical performance and robustness on two representative problems: pendulum swing-up and Atari Pong. We illustrate that policy networks with smaller Lipschitz bounds are more robust to disturbances, random noise, and targeted adversarial attacks than unconstrained policies composed of vanilla multi-layer perceptrons or convolutional neural networks. However, the structure of the Lipschitz layer is important. We find that the widely-used method of spectral normalization is too conservative and severely impacts clean performance, whereas more expressive Lipschitz layers such as the recently-proposed Sandwich layer can achieve improved robustness without sacrificing clean performance.

View on arXiv PDF Code

Similar