Multi-layer Abstraction for Nested Generation of Options (MANGO) in Hierarchical Reinforcement Learning
It addresses sample efficiency and interpretability problems for reinforcement learning in safety-critical and industrial applications, though it appears incremental as it builds on existing hierarchical methods.
This paper tackles the challenge of long-term sparse reward environments in reinforcement learning by introducing MANGO, a hierarchical framework that decomposes tasks into multiple abstraction layers with nested options, resulting in substantial improvements in sample efficiency and generalization in grid environments.
This paper introduces MANGO (Multilayer Abstraction for Nested Generation of Options), a novel hierarchical reinforcement learning framework designed to address the challenges of long-term sparse reward environments. MANGO decomposes complex tasks into multiple layers of abstraction, where each layer defines an abstract state space and employs options to modularize trajectories into macro-actions. These options are nested across layers, allowing for efficient reuse of learned movements and improved sample efficiency. The framework introduces intra-layer policies that guide the agent's transitions within the abstract state space, and task actions that integrate task-specific components such as reward functions. Experiments conducted in procedurally-generated grid environments demonstrate substantial improvements in both sample efficiency and generalization capabilities compared to standard RL methods. MANGO also enhances interpretability by making the agent's decision-making process transparent across layers, which is particularly valuable in safety-critical and industrial applications. Future work will explore automated discovery of abstractions and abstract actions, adaptation to continuous or fuzzy environments, and more robust multi-layer training strategies.