LG SY OC CO MLMay 9, 2012

New inference strategies for solving Markov Decision Processes using reversible jump MCMC

Matthias Hoffman, Hendrik Kueck, Nando de Freitas, Arnaud Doucet

arXiv:1205.2643v136 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of scalability in MCMC methods for Markov Decision Processes, which is incremental as it builds on prior inference techniques.

The paper tackles the challenge of making MCMC-based inference more practical for solving parameterized control problems in higher-dimensional spaces by introducing a new target distribution that incorporates more reward information and breaking strong correlations between policy parameters and sampled trajectories, resulting in more efficient sampling and estimates of the optimal policy.

In this paper we build on previous work which uses inferences techniques, in particular Markov Chain Monte Carlo (MCMC) methods, to solve parameterized control problems. We propose a number of modifications in order to make this approach more practical in general, higher-dimensional spaces. We first introduce a new target distribution which is able to incorporate more reward information from sampled trajectories. We also show how to break strong correlations between the policy parameters and sampled trajectories in order to sample more freely. Finally, we show how to incorporate these techniques in a principled manner to obtain estimates of the optimal policy.

View on arXiv PDF

Similar