LGAILOOct 11, 2017

Learning Task Specifications from Demonstrations

arXiv:1710.03875v539 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of safely recombining learned sub-tasks in robotics and other applications, though it is incremental as it builds on existing specification methods.

The paper tackles the problem of learning Boolean non-Markovian rewards (specifications) from demonstrations in stochastic environments, enabling safe and interpretable composition of sub-tasks, and demonstrates that this approach helps avoid issues from ad-hoc reward composition.

Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes