LG OCJul 9, 2024

MDP Geometry, Normalization and Reward Balancing Solvers

Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch. Paschalidis

arXiv:2407.06712v47.93 citationsh-index: 41

Originality Incremental advance

AI Analysis

This work addresses efficiency in reinforcement learning by offering a new algorithmic approach with proven gains, though it appears incremental in nature.

The authors tackled the problem of solving Markov Decision Processes (MDPs) by introducing a geometric interpretation and normalization procedure that preserves action advantages, leading to Reward Balancing algorithms that improve sample complexity for MDPs with unknown transition probabilities.

We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call Reward Balancing, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.

View on arXiv PDF

Similar