LGMLFeb 26, 2023

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Stanford
arXiv:2302.13203v339 citationsh-index: 37
Originality Incremental advance
AI Analysis

This provides the first sample complexity result for model-free robust RL, addressing robustness to environmental shifts in reinforcement learning.

The paper tackles the problem of reinforcement learning when the deployment environment differs from the training environment by extending a distributionally robust Q-learning framework, achieving a worst-case expected sample complexity bound of ̃O(|S||A|(1-γ)^{-5}ε^{-2}p_{∧}^{-6}δ^{-4}) for learning the optimal robust Q-function within an ε error.

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust $Q$-function within an $ε$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-γ)^{-5}ε^{-2}p_{\wedge}^{-6}δ^{-4})$, where $γ$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $δ$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes