LG MLNov 12, 2024

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

arXiv:2411.07514v22.6h-index: 7IEEE Trans Inf Theory

Originality Incremental advance

AI Analysis

It addresses robust policy learning from offline data in non-Markovian settings, an incremental advance over existing robust RL methods limited to Markovian or planning scenarios.

The paper tackles robust offline reinforcement learning for non-Markovian decision processes by proposing algorithms that achieve an ε-optimal robust policy with O(1/ε²) sample complexity, extending to cases with or without low-rank structure in the nominal model.

Distributionally robust offline reinforcement learning (RL) aims to find a policy that performs the best under the worst environment within an uncertainty set using an offline dataset collected from a nominal model. While recent advances in robust RL focus on Markov decision processes (MDPs), robust non-Markovian RL is limited to planning problem where the transitions in the uncertainty set are known. In this paper, we study the learning problem of robust offline non-Markovian RL. Specifically, when the nominal model admits a low-rank structure, we propose a new algorithm, featuring a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set. We also derive new dual forms for these robust values in non-Markovian RL, making our algorithm more amenable to practical implementation. By further introducing a novel type-I concentrability coefficient tailored for offline low-rank non-Markovian decision processes, we prove that our algorithm can find an $ε$-optimal robust policy using $O(1/ε^2)$ offline samples. Moreover, we extend our algorithm to the case when the nominal model does not have specific structure. With a new type-II concentrability coefficient, the extended algorithm also enjoys polynomial sample efficiency under all different types of the uncertainty set.

View on arXiv PDF

Similar