LG AI MLMay 27, 2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist

DeepMind

arXiv:2205.14211v114.614 citationsh-index: 88

Originality Incremental advance

AI Analysis

This provides a theoretical foundation for efficient model-free RL algorithms, addressing a key problem for researchers and practitioners in reinforcement learning.

The paper tackles the sample complexity of model-free reinforcement learning with a generative model, showing that mirror descent value iteration (MDVI) with KL divergence and entropy regularization is nearly minimax-optimal for finding an ε-optimal policy when ε is small.

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an $\varepsilon$-optimal policy when $\varepsilon$ is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.

View on arXiv PDF

Similar