LGApr 21

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko

arXiv:2604.1969560.513 citations

Predicted impact top 36% in LG · last 90 daysOriginality Highly original

AI Analysis

Provides the first polynomial sample complexity guarantee for planning in entropy-regularized MDPs and games, addressing a key theoretical gap.

The paper introduces SmoothCruiser, a planning algorithm for entropy-regularized MDPs and games that achieves O~(1/ε⁴) sample complexity, which is polynomial and problem-independent, unlike non-regularized settings where no such guarantees exist.

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

View on arXiv PDF

Similar