Concentration bounds for SSP Q-learning for average cost MDPs
This work provides theoretical guarantees for a reinforcement learning method in average cost settings, which is incremental as it builds on existing Q-learning and shortest path approaches.
The authors derived a concentration bound for a Q-learning algorithm applied to average cost Markov decision processes via a shortest path formulation, and numerically compared it with a relative value iteration-based scheme.
We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.