Distributed Weighted Parameter Averaging for SVM Training on Big Data
This incremental improvement addresses scalability issues for SVM training on big data, benefiting practitioners in machine learning and data science.
The paper tackles the trade-off between accuracy and efficiency in distributed SVM training by introducing weighted parameter averaging (WPA), which improves accuracy over standard parameter averaging with more partitions and converges faster than ADMM in feature space.
Two popular approaches for distributed training of SVMs on big data are parameter averaging and ADMM. Parameter averaging is efficient but suffers from loss of accuracy with increase in number of partitions, while ADMM in the feature space is accurate but suffers from slow convergence. In this paper, we report a hybrid approach called weighted parameter averaging (WPA), which optimizes the regularized hinge loss with respect to weights on parameters. The problem is shown to be same as solving SVM in a projected space. We also demonstrate an $O(\frac{1}{N})$ stability bound on final hypothesis given by WPA, using novel proof techniques. Experimental results on a variety of toy and real world datasets show that our approach is significantly more accurate than parameter averaging for high number of partitions. It is also seen the proposed method enjoys much faster convergence compared to ADMM in features space.