Expediting Reinforcement Learning by Incorporating Knowledge About Temporal Causality in the Environment
This work addresses the challenge of designing and modifying reward machines for RL in sparse-reward tasks, offering a method to incorporate causal knowledge, though it appears incremental in enhancing existing PRM frameworks.
The paper tackles the problem of reinforcement learning struggling with sparse rewards and complex temporal dependencies by incorporating Temporal Logic-based Causal Diagrams into reward formalisms, resulting in expedited policy learning and improved transfer to new environments, with theoretical convergence guarantees and empirical demonstrations.
Reinforcement learning (RL) algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment. Probabilistic reward machines (PRMs) are finite-state formalisms that can capture temporal dependencies in the reward signal, along with nondeterministic task outcomes. While special RL algorithms can exploit this finite-state structure to expedite learning, PRMs remain difficult to modify and design by hand. This hinders the already difficult tasks of utilizing high-level causal knowledge about the environment, and transferring the reward formalism into a new domain with a different causal structure. This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments. Furthermore, we provide a theoretical result about convergence to optimal policy for our method, and demonstrate its strengths empirically.