Discovery of Useful Questions as Auxiliary Tasks
This addresses the challenge of improving data efficiency and representation learning in reinforcement learning agents, though it is incremental as it builds on existing GVF and meta-gradient methods.
The paper tackles the problem of reinforcement learning agents discovering their own auxiliary tasks (questions) to improve learning efficiency, and demonstrates that meta-learned auxiliary tasks outperform hand-designed ones in building useful representations and enhancing data efficiency on Atari 2600 games.
Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value functions or GVFs, a fairly rich form of knowledge representation. Specifically, our method uses non-myopic meta-gradients to learn GVF-questions such that learning answers to them, as an auxiliary task, induces useful representations for the main task faced by the RL agent. We demonstrate that auxiliary tasks based on the discovered GVFs are sufficient, on their own, to build representations that support main task learning, and that they do so better than popular hand-designed auxiliary tasks from the literature. Furthermore, we show, in the context of Atari 2600 videogames, how such auxiliary tasks, meta-learned alongside the main task, can improve the data efficiency of an actor-critic agent.