Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes
This work addresses computational inefficiency in resource allocation problems for domains like logistics and finance, offering a novel method but with incremental improvements over existing planning techniques.
The paper tackles the problem of inefficient planning in Markov Decision Processes (MDPs) by introducing Structurally Decomposed MDPs (SD-MDPs) that use causal disentanglement, resulting in log-linear complexity O(T log T) for value estimation and improved policy performance in logistics and finance domains.
Markov Decision Processes (MDPs), as a general-purpose framework, often overlook the benefits of incorporating the causal structure of the transition and reward dynamics. For a subclass of resource allocation problems, we introduce the Structurally Decomposed MDP (SD-MDP), which leverages causal disentanglement to partition an MDP's temporal causal graph into independent components. By exploiting this disentanglement, SD-MDP enables dimensionality reduction and computational efficiency gains in optimal value function estimation. We reduce the sequential optimization problem to a fractional knapsack problem with log-linear complexity $O(T \log T)$, outperforming traditional stochastic programming methods that exhibit polynomial complexity with respect to the time horizon $T$. Additionally, SD-MDP's computational advantages are independent of state-action space size, making it viable for high-dimensional spaces. Furthermore, our approach integrates seamlessly with Monte Carlo Tree Search (MCTS), achieving higher expected rewards under constrained simulation budgets while providing a vanishing simple regret bound. Empirical results demonstrate superior policy performance over benchmarks across various logistics and finance domains.