OC LG RO SYDec 5, 2025

Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions

Ajinkya Bhole, Mohammad Mahmoudi Filabadi, Guillaume Crevecoeur, Tom Lefebvre

arXiv:2512.06109v21 citations

Originality Highly original

AI Analysis

This work provides a foundational framework for researchers in control theory and reinforcement learning, enabling broader application of computationally favorable properties across various control problems.

The paper tackles the unification of stochastic optimal control formulations by introducing a generalized Kullback-Leibler regularization framework that separates penalties on policies and transitions, recovering classical problems like Stochastic Optimal Control and Risk-Sensitive Optimal Control, and showing that iterated soft-policy solutions can retrieve original solutions with properties like linear Bellman equations and path integral solutions in specific cases.

This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.

View on arXiv PDF

Similar