AI SYOct 13, 2024

Generalization of Compositional Tasks with Logical Specification via Implicit Planning

arXiv:2410.09686v25.82 citationsh-index: 38ECML/PKDD

Originality Incremental advance

AI Analysis

This addresses the problem of slow convergence and sub-optimal performance in reinforcement learning for long-horizon compositional tasks, representing an incremental improvement.

The paper tackles the challenge of learning generalizable policies for compositional tasks with logical specifications by introducing a hierarchical RL framework with an implicit planner, achieving improved efficiency and optimality over previous methods.

In this study, we address the challenge of learning generalizable policies for compositional tasks defined by logical specifications. These tasks consist of multiple temporally extended sub-tasks. Due to the sub-task inter-dependencies and sparse reward issue in long-horizon tasks, existing reinforcement learning (RL) approaches, such as task-conditioned and goal-conditioned policies, continue to struggle with slow convergence and sub-optimal performance in generalizing to compositional tasks. To overcome these limitations, we introduce a new hierarchical RL framework that enhances the efficiency and optimality of task generalization. At the high level, we present an implicit planner specifically designed for generalizing compositional tasks. This planner selects the next sub-task and estimates the multi-step return for completing the remaining task to complete from the current state. It learns a latent transition model and performs planning in the latent space by using a graph neural network (GNN). Subsequently, the high-level planner's selected sub-task guides the low-level agent to effectively handle long-horizon tasks, while the multi-step return encourages the low-level policy to account for future sub-task dependencies, enhancing its optimality. We conduct comprehensive experiments to demonstrate the framework's advantages over previous methods in terms of both efficiency and optimality.

View on arXiv PDF

Similar