A constrained optimization perspective on actor critic algorithms and application to network routing
This work addresses the need for reliable reinforcement learning algorithms in domains like network routing, though it appears incremental as it builds on existing actor-critic frameworks with a constrained optimization perspective.
The authors tackled the problem of designing a convergent actor-critic algorithm for discounted reward Markov decision processes, resulting in a novel method with guaranteed convergence to an optimal policy and demonstrated practicality in network routing applications.
We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application.