The Logical Options Framework
This addresses the problem of efficient policy learning and composition in hierarchical reinforcement learning for environments with complex rules, though it appears incremental as it builds on existing hierarchical methods.
The paper tackles learning composable policies for complex environments by introducing the Logical Options Framework (LOF), which learns satisfying and optimal policies by representing tasks as automata, and demonstrates that these policies can be composed for unseen tasks with only 10-50 retraining steps.
Learning composable policies for environments with complex rules and tasks is a challenging problem. We introduce a hierarchical reinforcement learning framework called the Logical Options Framework (LOF) that learns policies that are satisfying, optimal, and composable. LOF efficiently learns policies that satisfy tasks by representing the task as an automaton and integrating it into learning and planning. We provide and prove conditions under which LOF will learn satisfying, optimal policies. And lastly, we show how LOF's learned policies can be composed to satisfy unseen tasks with only 10-50 retraining steps. We evaluate LOF on four tasks in discrete and continuous domains, including a 3D pick-and-place environment.