Towards Theoretical Understandings of Robust Markov Decision Processes: Sample Complexity and Asymptotics
This work provides theoretical guarantees for robust reinforcement learning, addressing uncertainty in decision-making processes, but it is incremental as it builds on prior non-asymptotic analyses by extending to more uncertainty sets and assumptions.
The paper tackles the sample complexity and asymptotic properties of robust Markov Decision Processes under various uncertainty sets, showing a sample complexity of about Õ(|S|²|A|/(ε²ρ²(1-γ)⁴)) under (s,a)-rectangular assumptions and extending results to s-rectangular cases with generally larger complexities, while also proving asymptotic normality of the optimal robust value function at a √n rate.
In this paper, we study the non-asymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes(MDPs), where the optimal robust policy and value function are solved only from a generative model. While prior work focusing on non-asymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and $(s,a)$-rectangular assumption, we improve their results and also consider other uncertainty sets, including $L_1$ and $χ^2$ balls. Our results show that when we assume $(s,a)$-rectangular on uncertainty sets, the sample complexity is about $\widetilde{O}\left(\frac{|\mathcal{S}|^2|\mathcal{A}|}{\varepsilon^2ρ^2(1-γ)^4}\right)$. In addition, we extend our results from $(s,a)$-rectangular assumption to $s$-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under $(s,a)$-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotic normal with a typical rate $\sqrt{n}$ under $(s,a)$ and $s$-rectangular assumptions from both theoretical and empirical perspectives.