LG MLJul 31, 2019

Inverse Reinforcement Learning with Multiple Ranked Experts

Pablo Samuel Castro, Shijian Li, Daqing Zhang

arXiv:1907.13411v16.613 citationsh-index: 66

Originality Incremental advance

AI Analysis

This work addresses the challenge of inverse reinforcement learning when only ranked expert demonstrations are available, which is incremental but useful for applications like analyzing GPS data.

The paper tackles the problem of learning optimal behavior in Markov Decision Processes without a specified reward function by using multiple demonstrators of varying performance, and demonstrates its efficacy on a taxi driver dataset with GPS trajectories.

We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance. We assume the demonstrators are classified into one of k ranks, and use ideas from ordinal regression to find a reward function that maximizes the margin between the different ranks. This approach is based on the idea that agents should not only learn how to behave from experts, but also how not to behave from non-experts. We show there are MDPs where important differences in the reward function would be hidden from existing algorithms by the behaviour of the expert. Our method is particularly useful for problems where we have access to a large set of agent behaviours with varying degrees of expertise (such as through GPS or cellphones). We highlight the differences between our approach and existing methods using a simple grid domain and demonstrate its efficacy on determining passenger-finding strategies for taxi drivers, using a large dataset of GPS trajectories.

View on arXiv PDF

Similar