Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning
This work addresses scalability and robustness issues in POMDP solving for applications requiring reliable decision-making under uncertainty, representing an incremental improvement with a novel hybrid method.
The authors tackled the scalability limitations of solving partially observable Markov decision processes (POMDPs) and hidden-model POMDPs by proposing the Lexpop framework, which uses deep reinforcement learning to train neural policies and extracts finite-state controllers for formal evaluation, resulting in outperforming state-of-the-art solvers on large-scale problems.
Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its worst-case POMDP. Using a set of such POMDPs, we iteratively train a robust neural policy and consequently extract a robust controller. Our experiments show that on problems with large state spaces, Lexpop outperforms state-of-the-art solvers for POMDPs as well as HM-POMDPs.