LGAIMay 22

Goal-Conditioned Agents that Learn Everything All at Once

arXiv:2605.2355127.0Has Code
Predicted impact top 21% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For reinforcement learning practitioners, LEO enables efficient all-goals learning at scale, addressing the computational bottleneck of naive relabeling.

The paper introduces LEO, a method for all-goals learning in goal-conditioned reinforcement learning that outputs values and actions for every goal simultaneously, achieving a >250x speed-up over naive relabeling and significantly outperforming other methods on Craftax while being competitive on continuous control tasks.

A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information, however it is usually computationally infeasible when done via naive relabelling. This can be overcome by jointly outputting values and actions for every goal at once, allowing for efficient, parallel all-goals updates with a single pass through the network, in a process we call Learning Everything all at Once (LEO). We show that this approach significantly outperforms other methods on goal-conditioned Craftax and is competitive with existing baselines on continuous control environments, while achieving a >250x speed-up compared to all-goals relabelling. We then go on to show that this approach can be made even more powerful by using LEO as a teacher network, rather than a direct actor. We hope that, by unlocking all-goals learning at scale, LEO can serve as a useful tool for RL practitioners in complex environments. We open source our code.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes