Robust $Q$-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty
This work addresses the challenge of robust stochastic optimal control for applications where estimated transition probabilities may be inaccurate, offering a practical solution for domains requiring reliability under uncertainty.
The authors tackled the problem of solving distributionally robust Markov decision processes under Wasserstein uncertainty by developing a novel Q-learning algorithm, proving its convergence and demonstrating its tractability and benefits in handling misspecified distributions through examples with real data.
We present a novel $Q$-learning algorithm tailored to solve distributionally robust Markov decision problems where the corresponding ambiguity set of transition probabilities for the underlying Markov decision process is a Wasserstein ball around a (possibly estimated) reference measure. We prove convergence of the presented algorithm and provide several examples also using real data to illustrate both the tractability of our algorithm as well as the benefits of considering distributional robustness when solving stochastic optimal control problems, in particular when the estimated distributions turn out to be misspecified in practice.