LGAIAug 6, 2025

Symmetric Behavior Regularization via Taylor Expansion of Symmetry

arXiv:2508.04225v2h-index: 5
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in offline RL for researchers, offering an incremental improvement by enabling symmetric divergences in behavior regularization.

The paper tackles the challenge of using symmetric divergences in offline reinforcement learning by introducing a Taylor expansion approach to obtain analytic policies and mitigate numerical issues, resulting in the Sf-AC algorithm that performs competitively on distribution approximation and MuJoCo benchmarks.

This paper introduces symmetric divergences to behavior regularization policy optimization (BRPO) to establish a novel offline RL framework. Existing methods focus on asymmetric divergences such as KL to obtain analytic regularized policies and a practical minimization objective. We show that symmetric divergences do not permit an analytic policy as regularization and can incur numerical issues as loss. We tackle these challenges by the Taylor series of $f$-divergence. Specifically, we prove that an analytic policy can be obtained with a finite series. For loss, we observe that symmetric divergences can be decomposed into an asymmetry and a conditional symmetry term, Taylor-expanding the latter alleviates numerical issues. Summing together, we propose Symmetric $f$ Actor-Critic (S$f$-AC), the first practical BRPO algorithm with symmetric divergences. Experimental results on distribution approximation and MuJoCo verify that S$f$-AC performs competitively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes