SCOPE: Scalable Composite Optimization for Learning on Spark
This work addresses scalability issues in distributed machine learning for practitioners using Spark, offering an incremental improvement in efficiency over existing DSO methods.
The paper tackles the scalability limitations of distributed stochastic optimization (DSO) methods for large-scale composite optimization problems in machine learning, proposing SCOPE, a novel DSO method implemented on Spark that achieves linear convergence for convex functions and outperforms state-of-the-art distributed learning methods in empirical tests.
Many machine learning models, such as logistic regression~(LR) and support vector machine~(SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods are not scalable enough. In this paper, we propose a novel DSO method, called \underline{s}calable \underline{c}omposite \underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both computation-efficient and communication-efficient. Theoretical analysis shows that SCOPE is convergent with linear convergence rate when the objective function is convex. Furthermore, empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods.