LG AIMar 22, 2024

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Guillermo Infante, David Kuric, Anders Jonsson, Vicenç Gómez, Herke van Hoof

arXiv:2403.15301v26.44 citationsh-index: 28ICAPS

Originality Highly original

AI Analysis

This addresses the problem of predictable policy generalization in complex sequential decision-making for reinforcement learning practitioners, representing a novel method rather than an incremental improvement.

The paper tackles the challenge of learning policies that generalize across multiple tasks with non-Markovian rewards by proposing a method using successor features to learn a policy basis, enabling optimal solutions without additional learning. The result is that this method asymptotically attains global optimality in stochastic environments, unlike other planning-based approaches.

Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.

View on arXiv PDF

Similar