LGJun 9, 2023

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

arXiv:2306.05859v210 citationsh-index: 81
Originality Incremental advance
AI Analysis

This work addresses the problem of limited scalability in robust sequential decision-making for researchers and practitioners in reinforcement learning, offering a practical solution that is incremental by building on existing non-robust algorithms.

The authors tackled the challenge of scaling robust Markov Decision Processes (RMDPs) to high-dimensional domains by introducing EWoK, an online method that estimates the worst transition kernel to learn robust policies, enabling the use of any non-robust RL algorithm and demonstrating effectiveness in experiments from Cartpole to DeepMind Control Suite environments.

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn robust policies. Unlike previous works that regularize the policy or value updates, EWoK achieves robustness by simulating the worst scenarios for the agent while retaining complete flexibility in the learning process. Notably, EWoK can be applied on top of any off-the-shelf {\em non-robust} RL algorithm, enabling easy scaling to high-dimensional domains. Our experiments, spanning from simple Cartpole to high-dimensional DeepMind Control Suite environments, demonstrate the effectiveness and applicability of the EWoK paradigm as a practical method for learning robust policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes