LGAIOct 6, 2021

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

arXiv:2110.02421v12 citations
Originality Incremental advance
AI Analysis

This provides theoretical insights for reinforcement learning practitioners, though it is incremental as it builds on existing algorithms.

The paper tackled the problem of explaining off-policy actor-critic algorithms by decomposing policy evaluation error into bias and variance components, showing that emphasizing recent experience and 1/age weighted sampling reduce bias and variance compared to uniform sampling.

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing Recent Experience sampling and 1/age weighted sampling. Both sampling strategies yield smaller bias and variance and are hence preferable to uniform sampling.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes