LGAIFeb 22, 2021

Action Redundancy in Reinforcement Learning

arXiv:2102.11329v211 citations
AI Analysis

This addresses a fundamental issue in reinforcement learning for improving efficiency and performance in tasks with redundant actions, though it is incremental as it builds on existing MaxEnt methods.

The paper tackles the problem of action redundancy in reinforcement learning, where multiple actions lead to the same state transitions, by proposing to maximize transition entropy instead of action entropy and developing algorithms to minimize redundancy, showing effectiveness on synthetic and benchmark environments like Atari and Mujoco.

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization. However, action entropy does not necessarily coincide with state entropy, e.g., when multiple actions produce the same transition. Instead, we propose to maximize the transition entropy, i.e., the entropy of next states. We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy. Particularly, we explore the latter in both deterministic and stochastic settings and develop tractable approximation methods in a near model-free setup. We construct algorithms to minimize action redundancy and demonstrate their effectiveness on a synthetic environment with multiple redundant actions as well as contemporary benchmarks in Atari and Mujoco. Our results suggest that action redundancy is a fundamental problem in reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes