LG ITJan 22, 2021

Differentiable Trust Region Layers for Deep Reinforcement Learning

Fabian Otto, Philipp Becker, Ngo Anh Vien, Hanna Carolin Ziesche, Gerhard Neumann

arXiv:2101.09207v216.025 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a key bottleneck in reinforcement learning for improving policy updates, offering a more robust and implementation-agnostic approach, though it is incremental as it builds on existing trust region concepts.

The paper tackles the difficulty of enforcing trust regions in deep reinforcement learning by proposing differentiable neural network layers that perform closed-form projections for Gaussian policies, achieving similar or better results than existing methods like TRPO and PPO while being less sensitive to implementation choices.

Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, they violate the constraints or fail to find the optimal solution within the trust region. Moreover, they are difficult to implement, often lack sufficient exploration, and have been shown to depend on seemingly unrelated implementation choices. In this work, we propose differentiable neural network layers to enforce trust regions for deep Gaussian policies via closed-form projections. Unlike existing methods, those layers formalize trust regions for each state individually and can complement existing reinforcement learning algorithms. We derive trust region projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices. The code is available at https://git.io/Jthb0.

View on arXiv PDF Code

Similar