DiGrad: Multi-Task Reinforcement Learning with Shared Actions
This addresses the challenge of multi-task learning in robotics for researchers and practitioners, but it is incremental as it builds on existing policy gradient methods with a novel heuristic.
The paper tackles the problem of inefficient multi-task reinforcement learning in complex robotic systems where tasks share actions, proposing DiGrad (Differential Policy Gradient) to enable stable and efficient training. The result shows that DiGrad outperforms related methods in continuous action spaces, as tested on an 8-link planar manipulator and a 27-DoF humanoid for multi-goal reachability tasks.
Most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where different tasks share a set of actions. In such environments a compound policy may be learnt with shared neural network parameters, which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from different tasks negate each other, making the learning unstable and sometimes less data efficient. In this paper, we propose a new approach for simultaneous training of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update to further improve the learning. The proposed architecture was tested on 8 link planar manipulator and 27 degrees of freedom(DoF) Humanoid for learning multi-goal reachability tasks for 3 and 2 end effectors respectively. We show that our approach supports efficient multi-task learning in complex robotic systems, outperforming related methods in continuous action spaces.