LGMLNov 7, 2019

Option Compatible Reward Inverse Reinforcement Learning

arXiv:1911.02723v21 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing effective reward functions for reinforcement learning in complex environments, though it is incremental as it builds on existing inverse reinforcement learning and options frameworks.

The paper tackles the problem of recovering reward functions from expert demonstrations within a hierarchical options framework, resulting in a method that accelerates transfer learning tasks and is robust to noise in demonstrations.

Reinforcement learning in complex environments is a challenging problem. In particular, the success of reinforcement learning algorithms depends on a well-designed reward function. Inverse reinforcement learning (IRL) solves the problem of recovering reward functions from expert demonstrations. In this paper, we solve a hierarchical inverse reinforcement learning problem within the options framework, which allows us to utilize intrinsic motivation of the expert demonstrations. A gradient method for parametrized options is used to deduce a defining equation for the Q-feature space, which leads to a reward feature space. Using a second-order optimality condition for option parameters, an optimal reward function is selected. Experimental results in both discrete and continuous domains confirm that our recovered rewards provide a solution to the IRL problem using temporal abstraction, which in turn are effective in accelerating transfer learning tasks. We also show that our method is robust to noises contained in expert demonstrations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes