LGLOSYJun 12, 2021

Model-free Reinforcement Learning for Branching Markov Decision Processes

arXiv:2106.06777v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses control problems in complex systems like BMDPs, which are incremental as it extends existing techniques to a new model.

The paper tackles the problem of optimal control in Branching Markov Decision Processes (BMDPs) by generalizing model-free reinforcement learning techniques to compute an optimal strategy for unknown BMDPs, with implementation results showing the approach is practical.

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes