Variational Gram Functions: Convex Analysis and Optimization
This work provides a new convex optimization tool for machine learning tasks such as hierarchical classification and multitask learning, though it appears incremental in extending existing regularization methods.
The authors introduced variational Gram functions (VGFs) as convex penalty functions to promote pairwise relations like orthogonality among vectors, and demonstrated their effectiveness in hierarchical classification through numerical experiments.
We propose a new class of convex penalty functions, called \emph{variational Gram functions} (VGFs), that can promote pairwise relations, such as orthogonality, among a set of vectors in a vector space. These functions can serve as regularizers in convex optimization problems arising from hierarchical classification, multitask learning, and estimating vectors with disjoint supports, among other applications. We study convexity for VGFs, and give efficient characterizations for their convex conjugates, subdifferentials, and proximal operators. We discuss efficient optimization algorithms for regularized loss minimization problems where the loss admits a common, yet simple, variational representation and the regularizer is a VGF. These algorithms enjoy a simple kernel trick, an efficient line search, as well as computational advantages over first order methods based on the subdifferential or proximal maps. We also establish a general representer theorem for such learning problems. Lastly, numerical experiments on a hierarchical classification problem are presented to demonstrate the effectiveness of VGFs and the associated optimization algorithms.