LGOct 1, 2025

Rectifying Regression in Reinforcement Learning

arXiv:2510.00885v21 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses policy optimization in reinforcement learning for researchers, offering incremental improvements in loss function selection.

The paper tackles the problem of suboptimal policies in value-based reinforcement learning by analyzing loss functions, showing theoretically that mean absolute error reduces the suboptimality gap better than mean squared error and empirically that cross-entropy losses outperform squared loss in linear RL.

This paper investigates the impact of the loss function in value-based methods for reinforcement learning through an analysis of underlying prediction objectives. We theoretically show that mean absolute error is a better prediction objective than the traditional mean squared error for controlling the learned policy's suboptimality gap. Furthermore, we present results that different loss functions are better aligned with these different regression objectives: binary and categorical cross-entropy losses with the mean absolute error and squared loss with the mean squared error. We then provide empirical evidence that algorithms minimizing these cross-entropy losses can outperform those based on the squared loss in linear reinforcement learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes