LG MLJun 20, 2012

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

arXiv:1206.5264v1252 citations

Originality Incremental advance

AI Analysis

This work addresses apprenticeship learning for robotics or AI systems by improving inverse reinforcement learning, but it appears incremental as it builds on existing gradient methods with specific enhancements.

The authors tackled the problem of learning a policy from expert behavior by assuming the expert is optimal for an unknown reward function, proposing a gradient algorithm that uses subdifferentials and natural gradients to handle nonsmooth and redundant mappings. They tested it in artificial domains, finding it more reliable and efficient than previous methods.

In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the mapping from the parameters to policies is both nonsmooth and highly redundant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.

View on arXiv PDF

Similar