OCLGNov 2, 2022

Quasi-Newton Steps for Efficient Online Exp-Concave Optimization

arXiv:2211.01357v310 citationsh-index: 13
Originality Highly original
AI Analysis

This work addresses computational bottlenecks in online optimization for machine learning practitioners, offering more efficient algorithms for exp-concave problems.

The paper tackles the computational inefficiency of online exp-concave optimization algorithms like Online Newton Step, which require costly projections, by using a self-concordant barrier regularizer to avoid projections and approximating Hessian inverses efficiently. This results in a total computational complexity of O(d^2 T + d^ω√T) for the online setting and O(d^3/ε) for the stochastic setting, answering an open question from 2013.

The aim of this paper is to design computationally-efficient and optimal algorithms for the online and stochastic exp-concave optimization settings. Typical algorithms for these settings, such as the Online Newton Step (ONS), can guarantee a $O(d\ln T)$ bound on their regret after $T$ rounds, where $d$ is the dimension of the feasible set. However, such algorithms perform so-called generalized projections whenever their iterates step outside the feasible set. Such generalized projections require $Ω(d^3)$ arithmetic operations even for simple sets such a Euclidean ball, making the total runtime of ONS of order $d^3 T$ after $T$ rounds, in the worst-case. In this paper, we side-step generalized projections by using a self-concordant barrier as a regularizer to compute the Newton steps. This ensures that the iterates are always within the feasible set without requiring projections. This approach still requires the computation of the inverse of the Hessian of the barrier at every step. However, using the stability properties of the Newton steps, we show that the inverse of the Hessians can be efficiently approximated via Taylor expansions for most rounds, resulting in a $O(d^2 T +d^ω\sqrt{T})$ total computational complexity, where $ω$ is the exponent of matrix multiplication. In the stochastic setting, we show that this translates into a $O(d^3/ε)$ computational complexity for finding an $ε$-suboptimal point, answering an open question by Koren 2013. We first show these new results for the simple case where the feasible set is a Euclidean ball. Then, to move to general convex set, we use a reduction to Online Convex Optimization over the Euclidean ball. Our final algorithm can be viewed as a more efficient version of ONS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes