MLLGNEMay 24, 2022

Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

arXiv:2206.06090v11 citationsh-index: 44
Originality Incremental advance
AI Analysis

This work addresses the problem of high regret in black-box optimization for robotics and other domains, offering an incremental improvement over existing methods.

The paper tackles the limitations of stochastic black-box optimizers like CMA-ES and MORE by improving MORE with decoupled updates for mean and covariance, enhanced entropy scheduling, and simplified model learning, achieving competitive results on benchmarks and outperforming ranking-based methods in regret on RL tasks.

Most successful stochastic black-box optimizers, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. Yet, the use of rankings also introduces several issues such as the underlying optimization objective is often unclear, i.e., we do not optimize the expected fitness. Further, while these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret. Lastly, noisy fitness function evaluations may result in solutions that are highly sub-optimal on expectation. In contrast, stochastic optimizers that are motivated by policy gradients, such as the Model-based Relative Entropy Stochastic Search (MORE) algorithm, directly optimize the expected fitness function without the use of rankings. MORE can be derived by applying natural policy gradients and compatible function approximation, and is using information theoretic constraints to ensure the stability of the policy update. While MORE does not suffer from the listed limitations, it often cannot achieve state of the art performance in comparison to ranking based methods. We improve MORE by decoupling the update of the mean and covariance of the search distribution allowing for more aggressive updates on the mean while keeping the update on the covariance conservative, an improved entropy scheduling technique based on an evolution path which results in faster convergence and a simplified and more effective model learning approach in comparison to the original paper. We compare our algorithm to state of the art black-box optimization algorithms on standard optimization tasks as well as on episodic RL tasks in robotics where it is also crucial to have small regret. We obtain competitive results on benchmark functions and clearly outperform ranking-based methods in terms of regret on the RL tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes