Finding Useful Predictions by Meta-gradient Descent to Improve Decision-making
This work addresses the problem of automating prediction selection in reinforcement learning for researchers and practitioners, representing an incremental improvement over manual specification methods.
The paper tackles the challenge of selecting useful predictions for decision-making in reinforcement learning by introducing a meta-gradient descent method that allows agents to autonomously choose predictions, achieving performance comparable to expertly chosen value functions in a partially observable domain.
In computational reinforcement learning, a growing body of work seeks to express an agent's model of the world through predictions about future sensations. In this manuscript we focus on predictions expressed as General Value Functions: temporally extended estimates of the accumulation of a future signal. One challenge is determining from the infinitely many predictions that the agent could possibly make which might support decision-making. In this work, we contribute a meta-gradient descent method by which an agent can directly specify what predictions it learns, independent of designer instruction. To that end, we introduce a partially observable domain suited to this investigation. We then demonstrate that through interaction with the environment an agent can independently select predictions that resolve the partial-observability, resulting in performance similar to expertly chosen value functions. By learning, rather than manually specifying these predictions, we enable the agent to identify useful predictions in a self-supervised manner, taking a step towards truly autonomous systems.