AIJul 6, 2020

Goal Kernel Planning: Linearly-Solvable Non-Markovian Policies for Logical Tasks with Goal-Conditioned Options

Thomas J. Ringstrom, Mohammadhosein Hasanbeig, Alessandro Abate

arXiv:2007.02527v24.11 citations

Originality Incremental advance

AI Analysis

This addresses the problem of handling complex structured tasks with logical conditions in hierarchical planning for AI systems, though it appears incremental as it builds on existing LMDP and Options frameworks.

The paper tackles the complexity of solving non-Markovian Boolean sub-goal tasks with ordering constraints in hierarchical planning by introducing LS-GKDP, a compositional framework that combines LMDPs with the Options Framework to enable efficient optimization of meta-policies and zero-shot task transfer in some cases.

In the domain of hierarchical planning, compositionality, abstraction, and task transfer are crucial for designing algorithms that can efficiently solve a variety of problems with maximal representational reuse. Many real-world problems require non-Markovian policies to handle complex structured tasks with logical conditions, often leading to prohibitively large state representations; this requires efficient methods for breaking these problems down and reusing structure between tasks. To this end, we introduce a compositional framework called Linearly-Solvable Goal Kernel Dynamic Programming (LS-GKDP) to address the complexity of solving non-Markovian Boolean sub-goal tasks with ordering constraints. LS-GKDP combines the Linearly-Solvable Markov Decision Process (LMDP) formalism with the Options Framework of Reinforcement Learning. LMDPs can be efficiently solved as a principal eigenvector problem, and options are policies with termination conditions used as temporally extended actions; with LS-GKDP we expand LMDPs to control over options for logical tasks. This involves decomposing a high-dimensional problem down into a set of goal-condition options for each goal and constructing a goal kernel, which is an abstract transition kernel that jumps from an option's initial-states to its termination-states along with an update of the higher-level task-state. We show how an LMDP with a goal kernel enables the efficient optimization of meta-policies in a lower-dimensional subspace defined by the task grounding. Options can also be remapped to new problems within a super-exponential space of tasks without significant recomputation, and we identify cases where the solution is invariant to the task grounding, permitting zero-shot task transfer.

View on arXiv PDF

Similar