Scalable Optimal Margin Distribution Machine
This work addresses computational efficiency for machine learning practitioners using kernel methods, but it is incremental as it builds on the existing ODM framework.
The paper tackles the scalability issues of the Optimal Margin Distribution Machine (ODM) by proposing a scalable version that achieves nearly ten times speedup in training, while maintaining generalization performance.
Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory, which demonstrates better generalization performance than the traditional large margin based counterparts. Nonetheless, it suffers from the ubiquitous scalability problem regarding both computation time and memory as other kernel methods. This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method. For nonlinear kernels, we propose a novel distribution-aware partition method to make the local ODM trained on each partition be close and converge fast to the global one. When linear kernel is applied, we extend a communication efficient SVRG method to accelerate the training further. Extensive empirical studies validate that our proposed method is highly computational efficient and almost never worsen the generalization.