LG MLJan 31, 2019

A Theory of Regularized Markov Decision Processes

Matthieu Geist, Bruno Scherrer, Olivier Pietquin

arXiv:1901.11275v235.5407 citations

Originality Incremental advance

AI Analysis

This foundational work provides a unified theoretical framework for regularized reinforcement learning, impacting the entire ML/AI field by connecting it to proximal convex optimization.

The authors developed a general theory of regularized Markov Decision Processes that extends existing reinforcement learning methods by considering a broader class of regularizers and the modified policy iteration framework, enabling error propagation analyses for algorithms like Trust Region Policy Optimization and Soft Q-learning.

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.

View on arXiv PDF

Similar