Active Learning for Risk-Sensitive Inverse Reinforcement Learning
This work addresses the challenge of modeling human risk preferences in safe-critical tasks for RS-IRL, but it is incremental as it builds on existing RS-IRL methods by adding active learning to improve efficiency.
The paper tackles the problem of inefficient learning in risk-sensitive inverse reinforcement learning (RS-IRL) due to redundant demonstrations by proposing an active learning approach with a probabilistic disturbance sampling scheme. The result shows accelerated convergence with lower variance while maintaining unbiased convergence, as confirmed by experimental results.
One typical assumption in inverse reinforcement learning (IRL) is that human experts act to optimize the expected utility of a stochastic cost with a fixed distribution. This assumption deviates from actual human behaviors under ambiguity. Risk-sensitive inverse reinforcement learning (RS-IRL) bridges such gap by assuming that humans act according to a random cost with respect to a set of subjectively distorted distributions instead of a fixed one. Such assumption provides the additional flexibility to model human's risk preferences, represented by a risk envelope, in safe-critical tasks. However, like other learning from demonstration techniques, RS-IRL could also suffer inefficient learning due to redundant demonstrations. Inspired by the concept of active learning, this research derives a probabilistic disturbance sampling scheme to enable an RS-IRL agent to query expert support that is likely to expose unrevealed boundaries of the expert's risk envelope. Experimental results confirm that our approach accelerates the convergence of RS-IRL algorithms with lower variance while still guaranteeing unbiased convergence.