LGApr 8

Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem

Hyukjun Yang, Han-Dong Lim, Donghwan Lee

arXiv:2604.068375.8

Predicted impact top 64% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This addresses a fundamental challenge in reinforcement learning for researchers, providing a principled connection to improve error control, though it appears incremental as it extends existing formulations.

The paper tackled the geometric mismatch between the Bellman operator's contraction in the Linfty-norm and L2-based optimization objectives in Markov decision processes, showing that a soft Bellman residual minimization with weighted Lp-norm aligns with contraction as p increases and derived error bounds.

The problem of solving Markov decision processes under function approximation remains a fundamental challenge, even under linear function approximation settings. A key difficulty arises from a geometric mismatch: while the Bellman optimality operator is contractive in the Linfty-norm, commonly used objectives such as projected value iteration and Bellman residual minimization rely on L2-based formulations. To enable gradient-based optimization, we consider a soft formulation of Bellman residual minimization and extend it to a generalized weighted Lp -norm. We show that this formulation aligns the optimization objective with the contraction geometry of the Bellman operator as p increases, and derive corresponding performance error bounds. Our analysis provides a principled connection between residual minimization and Bellman contraction, leading to improved control of error propagation while remaining compatible with gradient-based optimization.

View on arXiv PDF

Similar