LG AIOct 6, 2021

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

arXiv:2110.02421v14.42 citationsHas Code

Originality Incremental advance

AI Analysis

This provides theoretical insights for reinforcement learning practitioners, though it is incremental as it builds on existing algorithms.

The paper tackled the problem of explaining off-policy actor-critic algorithms by decomposing policy evaluation error into bias and variance components, showing that emphasizing recent experience and 1/age weighted sampling reduce bias and variance compared to uniform sampling.

Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing Recent Experience sampling and 1/age weighted sampling. Both sampling strategies yield smaller bias and variance and are hence preferable to uniform sampling.

View on arXiv PDF Code

Similar