Real-Time Recurrent Reinforcement Learning
This work addresses the problem of developing biologically inspired models for reinforcement learning in partially observable environments, which is incremental as it builds on existing RL and neural network concepts.
The paper tackled the problem of solving partially observable reinforcement learning tasks by introducing a biologically plausible RL framework, and the result was that the method, called real-time recurrent reinforcement learning (RTRRL), is capable of solving a diverse set of such tasks.
We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.