LGAIJul 12, 2021

Cautious Actor-Critic

arXiv:2107.05217v22 citations
AI Analysis

This addresses stability issues in actor-critic algorithms for continuous control, which is incremental as it builds on existing conservative policy and value iteration techniques.

The paper tackled the instability and errors in off-policy actor-critic learning by proposing the Cautious Actor-Critic (CAC) algorithm, which achieved comparable performance to state-of-the-art methods while significantly stabilizing learning on continuous control problems.

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes