ML AI LGJan 27, 2023

Single-Trajectory Distributionally Robust Reinforcement Learning

Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou

arXiv:2301.11721v217.416 citationsh-index: 37

Originality Highly original

AI Analysis

This work addresses the problem of robust policy learning in unknown test environments for RL practitioners, representing a novel method for a known bottleneck.

The authors tackled the challenge of distributionally robust reinforcement learning (DRRL) by developing a fully model-free algorithm, DRQ, that learns from a single trajectory, achieving superior robustness and sample complexity compared to existing methods.

To mitigate the limitation that the classical reinforcement learning (RL) framework heavily relies on identical training and test environments, Distributionally Robust RL (DRRL) has been proposed to enhance performance across a range of environments, possibly including unknown test environments. As a price for robustness gain, DRRL involves optimizing over a set of distributions, which is inherently more challenging than optimizing over a fixed distribution in the non-robust case. Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory. In this paper, we design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ). We delicately design a multi-timescale framework to fully utilize each incrementally arriving sample and directly learn the optimal distributionally robust policy without modelling the environment, thus the algorithm can be trained along a single trajectory in a model-free fashion. Despite the algorithm's complexity, we provide asymptotic convergence guarantees by generalizing classical stochastic approximation tools. Comprehensive experimental results demonstrate the superior robustness and sample complexity of our proposed algorithm, compared to non-robust methods and other robust RL algorithms.

View on arXiv PDF

Similar