LGDec 14, 2023

Improve Robustness of Reinforcement Learning against Observation Perturbations via $l_\infty$ Lipschitz Policy Networks

arXiv:2312.08751v111 citationsh-index: 4AAAI
AI Analysis

This work addresses robustness issues in reinforcement learning for real-world deployment, representing an incremental improvement over existing methods.

The paper tackles the vulnerability of deep reinforcement learning agents to slight observation perturbations by proposing SortRL, a method that uses a policy network with global l∞ Lipschitz continuity to enhance robustness. The results show that SortRL achieves state-of-the-art robustness performance against different perturbation strengths in experiments on classic control tasks and video games.

Deep Reinforcement Learning (DRL) has achieved remarkable advances in sequential decision tasks. However, recent works have revealed that DRL agents are susceptible to slight perturbations in observations. This vulnerability raises concerns regarding the effectiveness and robustness of deploying such agents in real-world applications. In this work, we propose a novel robust reinforcement learning method called SortRL, which improves the robustness of DRL policies against observation perturbations from the perspective of the network architecture. We employ a novel architecture for the policy network that incorporates global $l_\infty$ Lipschitz continuity and provide a convenient method to enhance policy robustness based on the output margin. Besides, a training framework is designed for SortRL, which solves given tasks while maintaining robustness against $l_\infty$ bounded perturbations on the observations. Several experiments are conducted to evaluate the effectiveness of our method, including classic control tasks and video games. The results demonstrate that SortRL achieves state-of-the-art robustness performance against different perturbation strength.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes