LGAIMLMay 28, 2020

Domain Knowledge Integration By Gradient Matching For Sample-Efficient Reinforcement Learning

arXiv:2005.13778v1
AI Analysis

This incremental improvement addresses sample efficiency for reinforcement learning practitioners, but does not introduce a new paradigm.

The paper tackles the sample inefficiency of model-free deep reinforcement learning by integrating domain knowledge from a dynamics predictor via gradient matching, resulting in improved sample efficiency as demonstrated experimentally.

Model-free deep reinforcement learning (RL) agents can learn an effective policy directly from repeated interactions with a black-box environment. However in practice, the algorithms often require large amounts of training experience to learn and generalize well. In addition, classic model-free learning ignores the domain information contained in the state transition tuples. Model-based RL, on the other hand, attempts to learn a model of the environment from experience and is substantially more sample efficient, but suffers from significantly large asymptotic bias owing to the imperfect dynamics model. In this paper, we propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics predictor to aid the model-free learner. We demonstrate this by presenting a technique for matching the gradient information from the model-based learner with the model-free component in an abstract low-dimensional space and validate the proposed technique through experimental results that demonstrate the efficacy of this approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes