Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms
This addresses scalarization problems in reinforcement learning for researchers, but it is incremental as it builds on existing A2C methods.
The paper tackles gradient overlapping and uncontrolled noise from entropy regularization in Advantage Actor Critic (A2C) algorithms, proposing techniques to avoid these issues and showing in pilot experiments that the method speeds up training.
In this research, some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm are investigated. The paper shows how a naive scalarization can lead to gradients overlapping. Furthermore, the possibility that the entropy regularization term can be a source of uncontrolled noise is discussed. With respect to the above issues, a technique to avoid gradient overlapping is proposed, while keeping the same loss formulation. Moreover, a method to avoid the uncontrolled noise, by sampling the actions from distributions with a desired minimum entropy, is investigated. Pilot experiments have been carried out to show how the proposed method speeds up the training. The proposed approach can be applied to any Advantage-based Reinforcement Learning algorithm.