LGApr 14, 2025

Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss

arXiv:2504.09929v1h-index: 6
Originality Incremental advance
AI Analysis

This addresses a fundamental issue in reinforcement learning for practitioners, though it is incremental as it builds on existing SOTA methods.

The paper tackles the problem of overestimation bias in model-free reinforcement learning by proposing a moderate target for Q-function updates, which reduces bias and improves performance in algorithms like DDPG and SAC.

Overestimation is a fundamental characteristic of model-free reinforcement learning (MF-RL), arising from the principles of temporal difference learning and the approximation of the Q-function. To address this challenge, we propose a novel moderate target in the Q-function update, formulated as a convex optimization of an overestimated Q-function and its lower bound. Our primary contribution lies in the efficient estimation of this lower bound through the lower expectile of the Q-value distribution conditioned on a state. Notably, our moderate target integrates seamlessly into state-of-the-art (SOTA) MF-RL algorithms, including Deep Deterministic Policy Gradient (DDPG) and Soft Actor Critic (SAC). Experimental results validate the effectiveness of our moderate target in mitigating overestimation bias in DDPG, SAC, and distributional RL algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes