Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process
This work addresses the problem of computational efficiency in sequential decision-making for researchers and practitioners dealing with large state spaces, though it is incremental as it extends existing decomposition techniques to MDPs.
The authors tackled the computational challenge of solving large Markov Decision Processes (MDPs) by introducing a structured class called Single-Input Superstate Decomposable MDPs (SISDMDPs), which enabled an exact and efficient policy evaluation method that scales for both average and discounted reward formulations.
Solving Markov Decision Processes (MDPs) remains a central challenge in sequential decision-making, especially when dealing with large state spaces and long-term optimization criteria. A key step in Bellman dynamic programming algorithms is the policy evaluation, which becomes computationally demanding in infinite-horizon settings such as average-reward or discounted-reward formulations. In the context of Markov chains, aggregation and disaggregation techniques have for a long time been used to reduce complexity by exploiting structural decompositions. In this work, we extend these principles to a structured class of MDPs. We define the Single-Input Superstate Decomposable Markov Decision Process (SISDMDP), which combines Chiu's single-input decomposition with Robertazzi's single-cycle recurrence property. When a policy induces this structure, the resulting transition graph can be decomposed into interacting components with centralized recurrence. We develop an exact and efficient policy evaluation method based on this structure. This yields a scalable solution applicable to both average and discounted reward MDPs.