Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences
This addresses a foundational problem in reinforcement learning for control systems by providing a more robust regularization method, though it is incremental as it builds on existing KL regularization frameworks.
The paper tackled the issue of KL divergence regularization becoming infinite under support mismatch and singular in low-noise limits in reinforcement learning by introducing Wasserstein and Kalman-Wasserstein KL analogues that remain finite and well-posed, demonstrating improved performance in control tasks such as a double integrator and cart-pole example.
Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits. Utilizing a unified information-geometric framework, we introduce (Kalman)-Wasserstein-based KL analogues by replacing the Fisher-Rao geometry in the dynamical formulation of the KL with transport-based geometries, and we derive closed-form values for common distribution families. These divergences remain finite under support mismatch and yield a geometric interpretation of regularization heuristics used in Kalman ensemble methods. We demonstrate the utility of these divergences in KL-regularized optimal control. In the fully tractable setting of linear time-invariant systems with Gaussian process noise, the classical KL reduces to a quadratic control penalty that becomes singular as process noise vanishes. Our variants remove this singularity, yielding well-posed problems. On a double integrator and a cart-pole example, the resulting controls outperform KL-based regularization.