LGAug 16, 2024

Efficient Multi-Policy Evaluation for Reinforcement Learning

arXiv:2408.08706v34 citationsh-index: 4
Originality Highly original
AI Analysis

This addresses the problem of inefficient multi-policy evaluation for RL practitioners, offering a novel method that improves upon existing approaches.

The paper tackles the inefficiency of evaluating multiple reinforcement learning policies separately by designing a tailored behavior policy that reduces estimator variance across all target policies, achieving state-of-the-art performance with substantially lower variance in various environments.

To unbiasedly evaluate multiple target policies, the dominant approach among RL practitioners is to run and evaluate each target policy separately. However, this evaluation method is far from efficient because samples are not shared across policies, and running target policies to evaluate themselves is actually not optimal. In this paper, we address these two weaknesses by designing a tailored behavior policy to reduce the variance of estimators across all target policies. Theoretically, we prove that executing this behavior policy with manyfold fewer samples outperforms on-policy evaluation on every target policy under characterized conditions. Empirically, we show our estimator has a substantially lower variance compared with previous best methods and achieves state-of-the-art performance in a broad range of environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes