NEOCNov 1, 2015

LM-CMA: an Alternative to L-BFGS for Large Scale Black-box Optimization

arXiv:1511.00221v167 citations
Originality Highly original
AI Analysis

This addresses optimization challenges in scenarios like machine learning and engineering where gradients are not accessible, offering a stochastic derivative-free method with improved scalability and invariance properties.

The paper tackles the problem of large-scale black-box optimization where gradient information is unavailable, proposing LM-CMA as an alternative to L-BFGS. It shows that LM-CMA outperforms CMA-ES variants on non-separable ill-conditioned problems with a factor increasing with dimension and achieves comparable performance to L-BFGS on large-scale smooth and nonsmooth problems.

The limited memory BFGS method (L-BFGS) of Liu and Nocedal (1989) is often considered to be the method of choice for continuous optimization when first- and/or second- order information is available. However, the use of L-BFGS can be complicated in a black-box scenario where gradient information is not available and therefore should be numerically estimated. The accuracy of this estimation, obtained by finite difference methods, is often problem-dependent that may lead to premature convergence of the algorithm. In this paper, we demonstrate an alternative to L-BFGS, the limited memory Covariance Matrix Adaptation Evolution Strategy (LM-CMA) proposed by Loshchilov (2014). The LM-CMA is a stochastic derivative-free algorithm for numerical optimization of non-linear, non-convex optimization problems. Inspired by the L-BFGS, the LM-CMA samples candidate solutions according to a covariance matrix reproduced from $m$ direction vectors selected during the optimization process. The decomposition of the covariance matrix into Cholesky factors allows to reduce the memory complexity to $O(mn)$, where $n$ is the number of decision variables. The time complexity of sampling one candidate solution is also $O(mn)$, but scales as only about 25 scalar-vector multiplications in practice. The algorithm has an important property of invariance w.r.t. strictly increasing transformations of the objective function, such transformations do not compromise its ability to approach the optimum. The LM-CMA outperforms the original CMA-ES and its large scale versions on non-separable ill-conditioned problems with a factor increasing with problem dimension. Invariance properties of the algorithm do not prevent it from demonstrating a comparable performance to L-BFGS on non-trivial large scale smooth and nonsmooth optimization problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes