MLLGFeb 7, 2024

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

arXiv:2402.04493v25 citationsh-index: 54ICML
AI Analysis

This addresses computational inefficiency and strong data assumptions in offline RL for researchers and practitioners, representing a strong incremental improvement.

The paper tackles offline reinforcement learning with linear MDPs by proposing a primal-dual algorithm that achieves O(ε^{-2}) sample complexity with partial data coverage, improving upon prior work requiring O(ε^{-4}) samples, and extends it to handle constrained RL settings.

We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for this setting either require a uniform data coverage assumptions or are computationally inefficient for finding an $ε$-optimal policy with $O(ε^{-2})$ sample complexity. In this paper, we propose a primal dual algorithm for offline RL with linear MDPs in the infinite-horizon discounted setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(ε^{-2})$ with partial data coverage assumption. Our work is an improvement upon a recent work that requires $O(ε^{-4})$ samples. Moreover, we extend our algorithm to work in the offline constrained RL setting that enforces constraints on additional reward signals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes