Automated Feature Selection for Inverse Reinforcement Learning
This work addresses a bottleneck in IRL for robotics and control domains, offering an incremental improvement over existing methods.
The paper tackles the problem of selecting good features for inverse reinforcement learning in continuous state spaces, where state variables alone are insufficient, by proposing a method using polynomial basis functions and feature selection based on trajectory probabilities and feature expectations, demonstrating effectiveness in recovering reward functions for non-linear control tasks.
Inverse reinforcement learning (IRL) is an imitation learning approach to learning reward functions from expert demonstrations. Its use avoids the difficult and tedious procedure of manual reward specification while retaining the generalization power of reinforcement learning. In IRL, the reward is usually represented as a linear combination of features. In continuous state spaces, the state variables alone are not sufficiently rich to be used as features, but which features are good is not known in general. To address this issue, we propose a method that employs polynomial basis functions to form a candidate set of features, which are shown to allow the matching of statistical moments of state distributions. Feature selection is then performed for the candidates by leveraging the correlation between trajectory probabilities and feature expectations. We demonstrate the approach's effectiveness by recovering reward functions that capture expert policies across non-linear control tasks of increasing complexity. Code, data, and videos are available at https://sites.google.com/view/feature4irl.