LG AIMar 27, 2025

A tale of two goals: leveraging sequentiality in multi-goal scenarios

Olivier Serris, Stéphane Doncieux, Olivier Sigaud

arXiv:2503.21677v14.1h-index: 17

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in multi-goal hierarchical reinforcement learning, offering an incremental improvement for scenarios with sequential goals.

The paper tackles the problem of hierarchical reinforcement learning failing when intermediate goals can be reached in ways that block subsequent goals, by introducing MDPs that condition policies on multiple future goals, resulting in improved stability and sample efficiency in navigation and pole-balancing tasks.

Several hierarchical reinforcement learning methods leverage planning to create a graph or sequences of intermediate goals, guiding a lower-level goal-conditioned (GC) policy to reach some final goals. The low-level policy is typically conditioned on the current goal, with the aim of reaching it as quickly as possible. However, this approach can fail when an intermediate goal can be reached in multiple ways, some of which may make it impossible to continue toward subsequent goals. To address this issue, we introduce two instances of Markov Decision Process (MDP) where the optimization objective favors policies that not only reach the current goal but also subsequent ones. In the first, the agent is conditioned on both the current and final goals, while in the second, it is conditioned on the next two goals in the sequence. We conduct a series of experiments on navigation and pole-balancing tasks in which sequences of intermediate goals are given. By evaluating policies trained with TD3+HER on both the standard GC-MDP and our proposed MDPs, we show that, in most cases, conditioning on the next two goals improves stability and sample efficiency over other approaches.

View on arXiv PDF

Similar