Fitted Q-Iteration via Max-Plus-Linear Approximation
This work addresses computational efficiency in offline RL, offering a method with theoretical guarantees, though it appears incremental as it builds on existing FQI frameworks.
The paper tackles the problem of offline reinforcement learning for discounted Markov decision processes by proposing novel fitted Q-iteration algorithms using max-plus-linear approximators, achieving provable convergence and reducing per-iteration complexity to be independent of the number of samples.
In this study, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.