AIMay 14, 2012

Approximate Modified Policy Iteration

Bruno Scherrer, Victor Gabillon, Mohammad Ghavamzadeh, Matthieu Geist

arXiv:1205.3054v253 citations

Originality Incremental advance

AI Analysis

This work addresses a gap in dynamic programming theory for researchers, but it is incremental as it extends known methods without introducing a new paradigm.

The paper tackles the lack of study on approximate modified policy iteration (AMPI) for large or infinite state/action spaces by proposing three implementations based on existing approximate dynamic programming algorithms, with error propagation analyses that unify previous work and a finite-sample analysis showing control over error balance.

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.

View on arXiv PDF

Similar