AI LGNov 29, 2019

Distributed Soft Actor-Critic with Multivariate Reward Representation and Knowledge Distillation

arXiv:1911.13056v23.61 citationsHas Code

Originality Incremental advance

AI Analysis

This is an incremental improvement for physics-based reinforcement learning in robotics locomotion tasks.

The paper tackled the NeurIPS 2019 Learning to Move - Walk Around challenge by developing a distributed Soft Actor-Critic method with a novel Q-function split technique, achieving a mean reward of 1303.727 points and third place in the competition.

In this paper, we describe NeurIPS 2019 Learning to Move - Walk Around challenge physics-based environment and present our solution to this competition which scored 1303.727 mean reward points and took 3rd place. Our method combines recent advances from both continuous- and discrete-action space reinforcement learning, such as Soft Actor-Critic and Recurrent Experience Replay in Distributed Reinforcement Learning. We trained our agent in two stages: to move somewhere at the first stage and to follow the target velocity field at the second stage. We also introduce novel Q-function split technique, which we believe facilitates the task of training an agent, allows critic pretraining and reusing it for solving harder problems, and mitigate reward shaping design efforts.

View on arXiv PDF Code

Similar