Information-geometric optimization with natural selection
This work addresses optimization challenges in machine learning and AI by proposing a simple, derivative-free algorithm, though it is incremental as it builds on existing evolutionary strategies with similar performance.
The paper tackles the problem of optimizing continuous objective functions without derivatives by formulating a new evolutionary algorithm inspired by natural selection and population genetics, showing that intermediate selection is most informative and introducing a recombination operator that preserves normal statistics, with the algorithm achieving performance similar to existing methods like covariance matrix adaptation.
Evolutionary algorithms, inspired by natural evolution, aim to optimize difficult objective functions without computing derivatives. Here we detail the relationship between population genetics and evolutionary optimization and formulate a new evolutionary algorithm. Optimization of a continuous objective function is analogous to searching for high fitness phenotypes on a fitness landscape. We summarize how natural selection moves a population along the non-euclidean gradient that is induced by the population on the fitness landscape (the natural gradient). Under normal approximations common in quantitative genetics, we show how selection is related to Newton's method in optimization. We find that intermediate selection is most informative of the fitness landscape. We describe the generation of new phenotypes and introduce an operator that recombines the whole population to generate variants that preserve normal statistics. Finally, we introduce a proof-of-principle algorithm that combines natural selection, our recombination operator, and an adaptive method to increase selection. Our algorithm is similar to covariance matrix adaptation and natural evolutionary strategies in optimization, and has similar performance. The algorithm is extremely simple in implementation with no matrix inversion or factorization, does not require storing a covariance matrix, and may form the basis of more general model-based optimization algorithms with natural gradient updates.