LGAIOCOct 21, 2022

On the connection between Bregman divergence and value in regularized Markov decision processes

arXiv:2210.12160v42 citationsh-index: 25
Originality Synthesis-oriented
AI Analysis

This provides a theoretical insight for reinforcement learning practitioners, though it appears incremental as it builds on existing regularization frameworks.

The paper derived a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the value function in regularized Markov decision processes, with implications for multi-task and offline reinforcement learning.

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes