Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
This work addresses a key bottleneck in applying evolution strategies to long inner problems with short truncations, benefiting researchers and practitioners in meta-learning and optimization.
The authors tackled the problem of high variance in gradient estimation for unrolled computation graphs by proposing ES-Single, an evolution strategies-based algorithm that simplifies implementation and reduces variance compared to Persistent Evolution Strategies (PES). They demonstrated empirically that ES-Single consistently outperforms PES on tasks like hyperparameter optimization and training recurrent neural networks, with variance that is constant with respect to truncation length.
We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. Similarly to the recently-proposed Persistent Evolution Strategies (PES), ES-Single is unbiased, and overcomes chaos arising from recursive function applications by smoothing the meta-loss landscape. ES-Single samples a single perturbation per particle, that is kept fixed over the course of an inner problem (e.g., perturbations are not re-sampled for each partial unroll). Compared to PES, ES-Single is simpler to implement and has lower variance: the variance of ES-Single is constant with respect to the number of truncated unrolls, removing a key barrier in applying ES to long inner problems using short truncations. We show that ES-Single is unbiased for quadratic inner problems, and demonstrate empirically that its variance can be substantially lower than that of PES. ES-Single consistently outperforms PES on a variety of tasks, including a synthetic benchmark task, hyperparameter optimization, training recurrent neural networks, and training learned optimizers.