LGAIMar 27, 2025

A tale of two goals: leveraging sequentiality in multi-goal scenarios

arXiv:2503.21677v1h-index: 17
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in multi-goal hierarchical reinforcement learning, offering an incremental improvement for scenarios with sequential goals.

The paper tackles the problem of hierarchical reinforcement learning failing when intermediate goals can be reached in ways that block subsequent goals, by introducing MDPs that condition policies on multiple future goals, resulting in improved stability and sample efficiency in navigation and pole-balancing tasks.

Several hierarchical reinforcement learning methods leverage planning to create a graph or sequences of intermediate goals, guiding a lower-level goal-conditioned (GC) policy to reach some final goals. The low-level policy is typically conditioned on the current goal, with the aim of reaching it as quickly as possible. However, this approach can fail when an intermediate goal can be reached in multiple ways, some of which may make it impossible to continue toward subsequent goals. To address this issue, we introduce two instances of Markov Decision Process (MDP) where the optimization objective favors policies that not only reach the current goal but also subsequent ones. In the first, the agent is conditioned on both the current and final goals, while in the second, it is conditioned on the next two goals in the sequence. We conduct a series of experiments on navigation and pole-balancing tasks in which sequences of intermediate goals are given. By evaluating policies trained with TD3+HER on both the standard GC-MDP and our proposed MDPs, we show that, in most cases, conditioning on the next two goals improves stability and sample efficiency over other approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes