LGOct 10, 2022

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

arXiv:2210.04470v611 citationsh-index: 50
AI Analysis

This work addresses a theoretical and algorithmic nuance in reinforcement learning for researchers, offering an alternative approach that is incremental in nature.

The paper revisits the standard actor-critic algorithm by proposing a reversed time-scale version called critic-actor, which emulates value iteration instead of policy iteration, and shows it performs on par with actor-critic in accuracy and computational effort in empirical comparisons with and without function approximation.

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs on par with actor-critic in terms of both accuracy and computational effort.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes