Safety-Aware Apprenticeship Learning
This work addresses safety concerns in apprenticeship learning for applications where safety is critical, representing an incremental improvement by integrating existing techniques.
The paper tackles the problem of ensuring safety in apprenticeship learning (AL) algorithms, where the reward function is unknown, by embedding probabilistic model checking with PCTL specifications to guarantee safety while maintaining policy performance, as demonstrated on challenging scenarios.
Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.