LGROJul 24, 2017

Bellman Gradient Iteration for Inverse Reinforcement Learning

arXiv:1707.07767v19 citations
Originality Incremental advance
AI Analysis

This addresses the problem of reward function inference for researchers and practitioners in reinforcement learning, but it is incremental as it builds on existing methods with improvements in flexibility.

The paper tackles the problem of recovering a reward function from observed agent actions in inverse reinforcement learning by introducing a Bellman Gradient Iteration method and approximations of the Bellman Optimality Equation, resulting in a method with comparable accuracy to state-of-the-art approaches while being more flexible.

This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Q-value with respect to the reward function. These methods allow us to build a differentiable relation between the Q-value and the reward function and learn an approximately optimal reward function with gradient methods. We test the proposed method in two simulated environments by evaluating the accuracy of different approximations and comparing the proposed method with existing solutions. The results show that even with a linear reward function, the proposed method has a comparable accuracy with the state-of-the-art method adopting a non-linear reward function, and the proposed method is more flexible because it is defined on observed actions instead of trajectories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes