LGJul 12, 2021

Behavior Constraining in Weight Space for Offline Reinforcement Learning

Phillip Swazinna, Steffen Udluft, Daniel Hein, Thomas Runkler

arXiv:2107.05479v16.54 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of policy regularization in offline RL, but appears incremental as it modifies the constraint approach rather than introducing a new paradigm.

The paper tackles the problem of learning policies from a fixed dataset in offline reinforcement learning by proposing a new algorithm that constrains the policy directly in weight space, demonstrating its effectiveness in experiments.

In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

View on arXiv PDF Code

Similar