Inverse Reinforcement Learning via Deep Gaussian Process
This work addresses the challenge of inferring rewards from demonstrations in robotics and AI, though it appears incremental as it builds on existing deep GP and IRL frameworks.
The authors tackled the problem of learning complex reward structures in inverse reinforcement learning with limited demonstrations by proposing a deep Gaussian process model, achieving superior performance over state-of-the-art methods on benchmarks like 'object world' and 'highway driving'.
We propose a new approach to inverse reinforcement learning (IRL) based on the deep Gaussian process (deep GP) model, which is capable of learning complicated reward structures with few demonstrations. Our model stacks multiple latent GP layers to learn abstract representations of the state feature space, which is linked to the demonstrations through the Maximum Entropy learning framework. Incorporating the IRL engine into the nonlinear latent structure renders existing deep GP inference approaches intractable. To tackle this, we develop a non-standard variational approximation framework which extends previous inference schemes. This allows for approximate Bayesian treatment of the feature space and guards against overfitting. Carrying out representation and inverse reinforcement learning simultaneously within our model outperforms state-of-the-art approaches, as we demonstrate with experiments on standard benchmarks ("object world","highway driving") and a new benchmark ("binary world").