LG AIFeb 2, 2023

A general Markov decision process formalism for action-state entropy-regularized reward maximization

Dmytro Grytskyy, Jorge Ramírez-Ruiz, Rubén Moreno-Bote

arXiv:2302.01098v14 citationsh-index: 27

Originality Highly original

AI Analysis

This work provides a foundational framework for researchers in reinforcement learning, addressing a theoretical bottleneck in entropy regularization methods.

The authors tackled the problem of unifying various forms of entropy regularization in Markov decision processes, which are used for regularization, generalization, and robust learning, by developing a general dual function formalism that transforms constrained optimization into an unconstrained convex problem for any mixture of action and state entropies.

Previous work has separately addressed different forms of action, state and action-state entropy regularization, pure exploration and space occupation. These problems have become extremely relevant for regularization, generalization, speeding up learning and providing robust solutions at unprecedented levels. However, solutions of those problems are hectic, ranging from convex and non-convex optimization, and unconstrained optimization to constrained optimization. Here we provide a general dual function formalism that transforms the constrained optimization problem into an unconstrained convex one for any mixture of action and state entropies. The cases with pure action entropy and pure state entropy are understood as limits of the mixture.

View on arXiv PDF

Similar