Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
This work addresses efficiency issues in reinforcement learning for practitioners dealing with MDPs, though it is incremental as it builds on existing RL methods by incorporating structural properties.
The paper tackles the problem of high computational and storage complexity in solving Markov Decision Processes (MDPs) by proposing a structure-aware reinforcement learning algorithm that exploits the ordered multi-threshold structure of optimal policies, resulting in faster convergence and reduced complexities compared to classical RL algorithms.
To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms which do not consider the structural properties of the optimal policy, we propose a structure-aware learning algorithm to exploit the ordered multi-threshold structure of the optimal policy, if any. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the proposed algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results establish that the proposed algorithm converges faster than other RL algorithms.