Communication-Efficient Asynchronous Stochastic Frank-Wolfe over Nuclear-norm Balls
This addresses efficiency issues in large-scale distributed machine learning for researchers and practitioners, though it is incremental as it builds on existing Frank-Wolfe methods.
The paper tackles the synchronization slowdown and high communication costs in distributed nuclear-norm constrained machine learning by proposing an asynchronous Stochastic Frank-Wolfe method (SFW-asyn), which maintains the same convergence rate as vanilla SFW and demonstrates near-linear speed-ups with the number of machines in experiments on Amazon EC2.
Large-scale machine learning training suffers from two prior challenges, specifically for nuclear-norm constrained problems with distributed systems: the synchronization slowdown due to the straggling workers, and high communication costs. In this work, we propose an asynchronous Stochastic Frank Wolfe (SFW-asyn) method, which, for the first time, solves the two problems simultaneously, while successfully maintaining the same convergence rate as the vanilla SFW. We implement our algorithm in python (with MPI) to run on Amazon EC2, and demonstrate that SFW-asyn yields speed-ups almost linear to the number of machines compared to the vanilla SFW.