LG AI OCOct 21, 2022

On the connection between Bregman divergence and value in regularized Markov decision processes

arXiv:2210.12160v45.82 citationsh-index: 25

Originality Synthesis-oriented

AI Analysis

This provides a theoretical insight for reinforcement learning practitioners, though it appears incremental as it builds on existing regularization frameworks.

The paper derived a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the value function in regularized Markov decision processes, with implications for multi-task and offline reinforcement learning.

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

View on arXiv PDF

Similar