LGAIOCMLMar 17, 2021

Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

arXiv:2103.09847v117 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental theoretical problem in offline RL for researchers, providing both lower bounds and algorithmic guarantees, though it is incremental in refining existing analysis.

The paper tackles the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning with linear function approximation, identifying a hard regime where the lower bound is exponential in dimension, and proposes an algorithm with polynomial sample complexity under low distribution shift assumptions.

In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime $dγ^{2}>1$, where $d$ is the dimension of the feature vector and $γ$ is the discount rate. In this regime, for any $q\in[γ^{2},1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $Ω\left(\frac{d}{γ^{2}\left(q-γ^{2}\right)\varepsilon^{2}}\exp\left(Θ\left(dγ^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$. Note that the lower bound of the sample complexity is exponential in $d$. If $q=γ^{2}$, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O\left(\max\left\{ \frac{\left\Vert θ^π\right\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}δ,\frac{1}{\varepsilon^{2}}\left(d+\log\frac{1}δ\right)\right\} \right)$ samples ($θ^π$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-δ$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes