On Accelerated Perceptrons and Beyond
This work provides a unified framework for accelerating Perceptron algorithms, offering faster convergence for machine learning practitioners dealing with linear classification and optimization problems.
The paper unifies recent accelerated Perceptron algorithms under a min-max framework using optimistic online learning, and demonstrates improved convergence rates for margin maximization (O(1/t^2) vs. O(log t/t^2)), Nesterov's accelerated gradient descent (first implicit bias result with O(1/t^2) rate), and p-norm Perceptron (Ω(√((p-1)log n)/γ) vs. Ω((p-1)/γ^2)).
The classical Perceptron algorithm of Rosenblatt can be used to find a linear threshold function to correctly classify $n$ linearly separable data points, assuming the classes are separated by some margin $γ> 0$. A foundational result is that Perceptron converges after $Ω(1/γ^{2})$ iterations. There have been several recent works that managed to improve this rate by a quadratic factor, to $Ω(\sqrt{\log n}/γ)$, with more sophisticated algorithms. In this paper, we unify these existing results under one framework by showing that they can all be described through the lens of solving min-max problems using modern acceleration techniques, mainly through optimistic online learning. We then show that the proposed framework also lead to improved results for a series of problems beyond the standard Perceptron setting. Specifically, a) For the margin maximization problem, we improve the state-of-the-art result from $O(\log t/t^2)$ to $O(1/t^2)$, where $t$ is the number of iterations; b) We provide the first result on identifying the implicit bias property of the classical Nesterov's accelerated gradient descent (NAG) algorithm, and show NAG can maximize the margin with an $O(1/t^2)$ rate; c) For the classical $p$-norm Perceptron problem, we provide an algorithm with $Ω(\sqrt{(p-1)\log n}/γ)$ convergence rate, while existing algorithms suffer the $Ω({(p-1)}/γ^2)$ convergence rate.