LGAug 25, 2025

Multi-layer Abstraction for Nested Generation of Options (MANGO) in Hierarchical Reinforcement Learning

Alessio Arcudi, Davide Sartor, Alberto Sinigaglia, Vincent François-Lavet, Gian Antonio Susto

arXiv:2508.17751v1h-index: 7IFAC-PapersOnLine

Originality Incremental advance

AI Analysis

It addresses sample efficiency and interpretability problems for reinforcement learning in safety-critical and industrial applications, though it appears incremental as it builds on existing hierarchical methods.

This paper tackles the challenge of long-term sparse reward environments in reinforcement learning by introducing MANGO, a hierarchical framework that decomposes tasks into multiple abstraction layers with nested options, resulting in substantial improvements in sample efficiency and generalization in grid environments.

This paper introduces MANGO (Multilayer Abstraction for Nested Generation of Options), a novel hierarchical reinforcement learning framework designed to address the challenges of long-term sparse reward environments. MANGO decomposes complex tasks into multiple layers of abstraction, where each layer defines an abstract state space and employs options to modularize trajectories into macro-actions. These options are nested across layers, allowing for efficient reuse of learned movements and improved sample efficiency. The framework introduces intra-layer policies that guide the agent's transitions within the abstract state space, and task actions that integrate task-specific components such as reward functions. Experiments conducted in procedurally-generated grid environments demonstrate substantial improvements in both sample efficiency and generalization capabilities compared to standard RL methods. MANGO also enhances interpretability by making the agent's decision-making process transparent across layers, which is particularly valuable in safety-critical and industrial applications. Future work will explore automated discovery of abstractions and abstract actions, adaptation to continuous or fuzzy environments, and more robust multi-layer training strategies.

View on arXiv PDF

Similar