LGAIJul 12, 2021

A Simple Reward-free Approach to Constrained Reinforcement Learning

arXiv:2107.05216v136 citations
Originality Incremental advance
AI Analysis

This work addresses constrained RL problems, such as safety or budget constraints, for researchers and practitioners in reinforcement learning, offering a simpler and more efficient approach by leveraging existing reward-free methods, though it is incremental as it builds on prior reward-free RL techniques.

The paper tackles constrained reinforcement learning (RL) by bridging it with reward-free RL, proposing a meta-algorithm that uses any reward-free RL oracle to solve constrained RL problems with minimal overhead in sample complexity. The framework achieves sample complexity results matching the best existing ones in tabular MDPs, extends to two-player Markov games, and provides new results for linear function approximation.

In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the overall reward but also satisfy the additional safety, diversity, or budget constraints. Consequently, existing constrained RL solutions require several new algorithmic ingredients that are notably different from standard RL. On the other hand, reward-free RL is independently developed in the unconstrained literature, which learns the transition dynamics without using the reward information, and thus naturally capable of addressing RL with multiple objectives under the common dynamics. This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity. Utilizing the existing reward-free RL solvers, our framework provides sharp sample complexity results for constrained RL in the tabular MDP setting, matching the best existing results up to a factor of horizon dependence; our framework directly extends to a setting of tabular two-player Markov games, and gives a new result for constrained RL with linear function approximation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes