LGMLDec 21, 2019

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

arXiv:1912.10325v329 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in reinforcement learning for practitioners dealing with MDPs, though it is incremental as it builds on existing RL methods by incorporating structural properties.

The paper tackles the problem of high computational and storage complexity in solving Markov Decision Processes (MDPs) by proposing a structure-aware reinforcement learning algorithm that exploits the ordered multi-threshold structure of optimal policies, resulting in faster convergence and reduced complexities compared to classical RL algorithms.

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms which do not consider the structural properties of the optimal policy, we propose a structure-aware learning algorithm to exploit the ordered multi-threshold structure of the optimal policy, if any. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the proposed algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results establish that the proposed algorithm converges faster than other RL algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes