LGApr 21

Planning in entropy-regularized Markov decision processes and games

arXiv:2604.1969560.513 citations
Predicted impact top 36% in LG · last 90 daysOriginality Highly original
AI Analysis

Provides the first polynomial sample complexity guarantee for planning in entropy-regularized MDPs and games, addressing a key theoretical gap.

The paper introduces SmoothCruiser, a planning algorithm for entropy-regularized MDPs and games that achieves O~(1/ε⁴) sample complexity, which is polynomial and problem-independent, unlike non-regularized settings where no such guarantees exist.

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes