Approximations and Learning for Continuous State and Action MDPs under Average Cost Criteria

arXiv:2308.0759178.76 citationsh-index: 30

AI Analysis

It extends theoretical guarantees for average-cost MDPs to weaker continuity assumptions, benefiting researchers working on reinforcement learning in continuous spaces.

This paper provides approximation methods and Q-learning algorithms for continuous state and action MDPs under average cost criteria, relaxing prior total variation conditions to weak or Wasserstein continuity, and establishing convergence to near-optimal solutions.

In this paper, for Markov Decision Processes (MDPs) with standard Borel spaces, (i) we first provide a discretization based approximation method for MDPs with continuous spaces under average cost criteria, and provide error bounds for approximations when the dynamics are only weakly continuous (for asymptotic convergence of errors as the grid sizes vanish) or Wasserstein continuous (with a rate in approximation as the grid sizes vanish) under certain ergodicity assumptions. In particular, we relax the total variation condition given in prior work to weak continuity or Wasserstein continuity. (ii) We provide synchronous and asynchronous (quantized) Q-learning algorithms for continuous spaces via quantization (where the quantized state is taken to be the actual state in corresponding Q-learning algorithms presented in the paper), and establish their convergence. (iii) We finally show that the convergence is to the optimal Q values of a finite approximate model constructed via quantization, which implies near optimality of the arrived solution.

View on arXiv PDF

Similar