Hierarchies of Reward Machines
This work addresses the challenge of complex task decomposition in reinforcement learning for agents, though it is incremental as it builds on existing reward machines and options frameworks.
The authors tackled the problem of long-horizon and sparse reward tasks in reinforcement learning by introducing hierarchies of reward machines (HRMs), which abstract subtask structures through composition, and showed that exploiting a handcrafted HRM leads to faster convergence and that learning an HRM is feasible where flat representations are not.
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle long-horizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We exploit HRMs by treating each call to an RM as an independently solvable subtask using the options framework, and describe a curriculum-based method to learn HRMs from traces observed by the agent. Our experiments reveal that exploiting a handcrafted HRM leads to faster convergence than with a flat HRM, and that learning an HRM is feasible in cases where its equivalent flat representation is not.