Q-Learning under Finite Model Uncertainty
This work addresses robust reinforcement learning for scenarios with finite ambiguity sets, offering a flexible approach beyond common formulations like KL and Wasserstein balls, though it appears incremental in extending existing robust methods.
The authors tackled the problem of robust Q-learning in Markov decision processes with finite model uncertainty, proposing an algorithm that converges to the robust optimum and provides non-asymptotic error bounds separating stochastic approximation from transition-kernel estimation errors.
We propose a robust Q-learning algorithm for Markov decision processes under model uncertainty when each state-action pair is associated with a finite ambiguity set of candidate transition kernels. This finite-measure framework enables highly flexible, user-designed uncertainty models and goes beyond the common KL and Wasserstein ball formulations. We establish almost sure convergence of the learned Q-function to the robust optimum, and derive non-asymptotic high-probability error bounds that separate stochastic approximation error from transition-kernel estimation error. Finally, we show that Wasserstein ball and parametric ambiguity sets can be approximated by finite ambiguity sets, allowing our algorithm to be used as a generic solver beyond the finite setting.