LGAIMLOct 22, 2018

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

arXiv:1810.09103v420 citations
Originality Incremental advance
AI Analysis

This work addresses policy optimization in reinforcement learning, offering a novel method that is less sensitive to hyperparameters, though it appears incremental as it builds on existing actor-critic frameworks.

The paper tackled the problem of policy improvement in actor-critic methods by introducing a conditional cross-entropy method (CCEM) for actor updates, resulting in a Greedy AC algorithm that empirically outperforms Soft Actor-Critic and shows reduced sensitivity to entropy regularization.

Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the action-values, with the addition of entropy regularization for soft variants. In this work, we explore an alternative update for the actor, based on an extension of the cross entropy method (CEM) to condition on inputs (states). The idea is to start with a broader policy and slowly concentrate around maximal actions, using a maximum likelihood update towards actions in the top percentile per state. The speed of this concentration is controlled by a proposal policy, that concentrates at a slower rate than the actor. We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values. We empirically show that our Greedy AC algorithm, that uses CCEM for the actor update, performs better than Soft Actor-Critic and is much less sensitive to entropy-regularization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes