AI LGApr 27

Hierarchical Behaviour Spaces

Michael Tryfan Matthews, Anssi Kanervisto, Jakob Foerster, Pierluca D'Oro, Scott Fujimoto, Mikael Henaff

arXiv:2604.2455859.1

AI Analysis

For researchers in hierarchical reinforcement learning, this work offers a novel method to enhance policy expressiveness, though the benefits are shown to stem from exploration improvements rather than hierarchical reasoning.

The paper introduces Hierarchical Behaviour Spaces (HBS), which allows controllers to specify linear combinations over reward functions to induce a more expressive set of policies in hierarchical reinforcement learning. Evaluated on the NetHack Learning Environment, HBS shows strong performance, with benefits attributed to increased exploration rather than long-term reasoning.

Recent work in hierarchical reinforcement learning has shown success in scaling to billions of timesteps when learning over a set of predefined option reward functions. We show that, instead of using a single reward function per option, the reward functions can be effectively used to induce a space of behaviours, by letting the controller specify linear combinations over reward functions, allowing a more expressive set of policies to be represented. We call this method Hierarchical Behaviour Spaces (HBS). We evaluate HBS on the NetHack Learning Environment, demonstrating strong performance. We conduct a series of experiments and determine that, perhaps going against conventional wisdom, the benefits of hierarchy in our method come from increased exploration rather than long term reasoning.

View on arXiv PDF

Similar