LG MLJul 19, 2019

Delegative Reinforcement Learning: learning to avoid traps with a little help

arXiv:1907.08461v17 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of avoiding traps in reinforcement learning for researchers, but it is incremental as it builds on existing methods with specific limitations.

The paper tackles the problem of reinforcement learning in environments with traps, deriving a regret bound without episodic assumptions by allowing algorithms to delegate actions to an external advisor. The result is a new setting called delegative reinforcement learning, demonstrated with a variant of Posterior Sampling Reinforcement Learning, though it is limited to finite MDPs and not anytime.

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.

View on arXiv PDF

Similar