Distributed Optimization for Over-Parameterized Learning
This addresses communication bottlenecks in distributed learning systems, offering a novel approach that is not incremental but builds on over-parameterization assumptions.
The paper tackles the problem of communication overhead in distributed optimization by proving that nodes can perform an arbitrary number of local optimization steps before communication, reducing overall communication, with experiments confirming this in convex optimization and deep learning.
Distributed optimization often consists of two updating phases: local optimization and inter-node communication. Conventional approaches require working nodes to communicate with the server every one or few iterations to guarantee convergence. In this paper, we establish a completely different conclusion that each node can perform an arbitrary number of local optimization steps before communication. Moreover, we show that the more local updating can reduce the overall communication, even for an infinity number of steps where each node is free to update its local model to near-optimality before exchanging information. The extra assumption we make is that the optimal sets of local loss functions have a non-empty intersection, which is inspired by the over-paramterization phenomenon in large-scale optimization and deep learning. Our theoretical findings are confirmed by both distributed convex optimization and deep learning experiments.