ML LGFeb 7, 2024

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

arXiv:2402.04493v212.05 citationsh-index: 54ICML

Originality Highly original

AI Analysis

This addresses computational inefficiency and strong data assumptions in offline RL for researchers and practitioners, representing a strong incremental improvement.

The paper tackles offline reinforcement learning with linear MDPs by proposing a primal-dual algorithm that achieves O(ε^{-2}) sample complexity with partial data coverage, improving upon prior work requiring O(ε^{-4}) samples, and extends it to handle constrained RL settings.

We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $ε$-optimal policy with $O(ε^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(ε^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $O(ε^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

View on arXiv PDF

Similar