LG AIJul 8, 2024

Solving Multi-Model MDPs by Coordinate Ascent and Dynamic Programming

arXiv:2407.06329v110.46 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This work addresses robust policy optimization in uncertain environments, offering an incremental improvement over prior dynamic programming approaches for MMDPs.

The paper tackles the NP-hard problem of solving Multi-model Markov Decision Processes (MMDPs) for robust policies under parameter uncertainty by proposing CADP, a method combining coordinate ascent and dynamic programming, which theoretically guarantees monotone improvements and outperforms existing methods like WSU on benchmark problems.

Multi-model Markov decision process (MMDP) is a promising framework for computing policies that are robust to parameter uncertainty in MDPs. MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models. Because MMDPs are NP-hard to solve, most methods resort to approximations. In this paper, we derive the policy gradient of MMDPs and propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs. The main innovation of CADP compared with earlier algorithms is to take the coordinate ascent perspective to adjust model weights iteratively to guarantee monotone policy improvements to a local maximum. A theoretical analysis of CADP proves that it never performs worse than previous dynamic programming algorithms like WSU. Our numerical results indicate that CADP substantially outperforms existing methods on several benchmark problems.

View on arXiv PDF Code

Similar