Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
This work addresses the gap in theoretical analysis for BPPA in machine learning, offering insights into how divergence choice affects classifier quality, though it is incremental in extending known regularization concepts to Bregman methods.
The paper tackles the theoretical understanding of Bregman proximal point algorithm (BPPA) and mirror descent for learning linear classifiers with separable data, demonstrating provable algorithmic regularization and providing a tight lower bound on the margin that depends on the condition number of the Bregman divergence.
Bregman proximal point algorithm (BPPA) has witnessed emerging machine learning applications, yet its theoretical understanding has been largely unexplored. We study the computational properties of BPPA through learning linear classifiers with separable data, and demonstrate provable algorithmic regularization of BPPA. For any BPPA instantiated with a fixed Bregman divergence, we provide a lower bound of the margin obtained by BPPA with respect to an arbitrarily chosen norm. The obtained margin lower bound differs from the maximal margin by a multiplicative factor, which inversely depends on the condition number of the distance-generating function measured in the dual norm. We show that the dependence on the condition number is tight, thus demonstrating the importance of divergence in affecting the quality of the learned classifiers. We then extend our findings to mirror descent, for which we establish similar connections between the margin and Bregman divergence, together with a non-asymptotic analysis. Numerical experiments on both synthetic and real-world datasets are provided to support our theoretical findings. To the best of our knowledge, the aforementioned findings appear to be new in the literature of algorithmic regularization.