Leveraging Reward Gradients For Reinforcement Learning in Differentiable Physics Simulations
This work addresses a key bottleneck in applying differentiable simulators to robotics control, offering a potential improvement for researchers and practitioners in reinforcement learning and robotics.
The paper tackled the challenge of using analytic reward gradients in differentiable physics simulators for reinforcement learning, which had previously underperformed gradient-free methods, and introduced a novel algorithm that outperformed state-of-the-art deep reinforcement learning on nonlinear control problems.
In recent years, fully differentiable rigid body physics simulators have been developed, which can be used to simulate a wide range of robotic systems. In the context of reinforcement learning for control, these simulators theoretically allow algorithms to be applied directly to analytic gradients of the reward function. However, to date, these gradients have proved extremely challenging to use, and are outclassed by algorithms using no gradient information at all. In this work we present a novel algorithm, cross entropy analytic policy gradients, that is able to leverage these gradients to outperform state of art deep reinforcement learning on a set of challenging nonlinear control problems.