LGDec 25, 2017

Learning to Run with Actor-Critic Ensemble

arXiv:1712.08987v125 citations
Originality Incremental advance
AI Analysis

This work addresses performance enhancement in reinforcement learning for locomotion tasks, representing an incremental improvement over existing methods.

The paper tackled improving the Deep Deterministic Policy Gradient (DDPG) algorithm by introducing an Actor-Critic Ensemble (ACE) method, which uses a critic ensemble to select the best action from multiple actor proposals at inference time, resulting in winning 2nd place in the NIPS'17 Learning to Run competition.

We introduce an Actor-Critic Ensemble(ACE) method for improving the performance of Deep Deterministic Policy Gradient(DDPG) algorithm. At inference time, our method uses a critic ensemble to select the best action from proposals of multiple actors running in parallel. By having a larger candidate set, our method can avoid actions that have fatal consequences, while staying deterministic. Using ACE, we have won the 2nd place in NIPS'17 Learning to Run competition, under the name of "Megvii-hzwer".

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes