Guiding Evolutionary Strategies by Differentiable Robot Simulators
This addresses the problem of high sample costs for researchers and practitioners using Evolutionary Strategies in robotics, though it appears incremental as it builds on existing methods.
The paper tackles the sample inefficiency of Evolutionary Strategies in robotic policy search by combining them with Differentiable Robot Simulators, resulting in a 3x-5x reduction in sample complexity in both simulation and real-world tasks.
In recent years, Evolutionary Strategies were actively explored in robotic tasks for policy search as they provide a simpler alternative to reinforcement learning algorithms. However, this class of algorithms is often claimed to be extremely sample-inefficient. On the other hand, there is a growing interest in Differentiable Robot Simulators (DRS) as they potentially can find successful policies with only a handful of trajectories. But the resulting gradient is not always useful for the first-order optimization. In this work, we demonstrate how DRS gradient can be used in conjunction with Evolutionary Strategies. Preliminary results suggest that this combination can reduce sample complexity of Evolutionary Strategies by 3x-5x times in both simulation and the real world.