MLLGJun 15, 2017

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

arXiv:1706.04892v144 citations
Originality Highly original
AI Analysis

This work addresses computational bottlenecks in online learning for researchers and practitioners using kernel methods, offering a more efficient second-order approach with theoretical guarantees.

The paper tackles the high computational complexity of second-order kernel online convex optimization methods by introducing KONS and Sketched-KONS, achieving O(d_eff log T) regret with reduced space and time complexity by a factor of γ^2 to O(t^2γ^2) per iteration, while incurring only 1/γ times more regret.

Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve $\mathcal{O}(\log(\text{Det}(\boldsymbol{K})))$ regret, which we show scales as $\mathcal{O}(d_{\text{eff}}\log T)$, where $d_{\text{eff}}$ is the effective dimension of the problem and is usually much smaller than $\mathcal{O}(\sqrt{T})$. The main drawback of second-order methods is their much higher $\mathcal{O}(t^2)$ space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves $\mathcal{O}(d_{\text{eff}}\log T)$ regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix $\boldsymbol{K}_t$, and show that for a chosen parameter $γ\leq 1$ our Sketched-KONS reduces the space and time complexity by a factor of $γ^2$ to $\mathcal{O}(t^2γ^2)$ space and time per iteration, while incurring only $1/γ$ times more regret.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes