Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
This work addresses average-reward reinforcement learning, which is important for applications requiring long-term performance, but it is incremental as it builds directly on prior discounted reward methods.
The authors tackled the problem of extending the Full Gradient DQN algorithm from discounted to average-reward Markov decision processes, resulting in a better convergence rate compared to existing methods like RVI Q-Learning and Differential Q-Learning across various tasks.
We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-Learning with recently proposed Differential Q-Learning in the neural function approximation setting with Full Gradient DQN and DQN. We also extend this to learn Whittle indices for Markovian restless multi-armed bandits. We observe a better convergence rate of the proposed Full Gradient variant across different tasks.