A Fast Convergence Theory for Offline Decision Making
This work provides a foundational theory for offline reinforcement learning and off-policy evaluation, addressing a key bottleneck in data efficiency for researchers and practitioners.
The paper tackles the problem of slow convergence in offline decision making by introducing a framework called Decision Making with Offline Feedback (DMOF) and proposing the Empirical Decision with Divergence (EDD) algorithm, which achieves a fast convergence rate of 1/N under partial coverage assumptions.
This paper proposes the first generic fast convergence result in general function approximation for offline decision making problems, which include offline reinforcement learning (RL) and off-policy evaluation (OPE) as special cases. To unify different settings, we introduce a framework called Decision Making with Offline Feedback (DMOF), which captures a wide range of offline decision making problems. Within this framework, we propose a simple yet powerful algorithm called Empirical Decision with Divergence (EDD), whose upper bound can be termed as a coefficient named Empirical Offline Estimation Coefficient (EOEC). We show that EOEC is instance-dependent and actually measures the correlation of the problem. When assuming partial coverage in the dataset, EOEC will reduce in a rate of $1/N$ where $N$ is the size of the dataset, endowing EDD with a fast convergence guarantee. Finally, we complement the above results with a lower bound in the DMOF framework, which further demonstrates the soundness of our theory.