Local SGD in Overparameterized Linear Regression
This work addresses efficient distributed learning for overparameterized models, providing theoretical guarantees for practitioners, but it is incremental as it builds on existing SGD and regression frameworks.
The paper tackles distributed learning with constant stepsize SGD (DSGD) in overparameterized linear regression, proving general upper and lower bounds on excess risk and showing it scales with variance when local nodes don't grow too fast, with DSGD outperforming distributed ridge regression in excess risk while having similar sample complexity.
We consider distributed learning using constant stepsize SGD (DSGD) over several devices, each sending a final model update to a central server. In a final step, the local estimates are aggregated. We prove in the setting of overparameterized linear regression general upper bounds with matching lower bounds and derive learning rates for specific data generating distributions. We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size. We further compare the sample complexity of DSGD with the sample complexity of distributed ridge regression (DRR) and show that the excess SGD-risk is smaller than the excess RR-risk, where both sample complexities are of the same order.