DUAL-LOCO: Distributing Statistical Estimation Using Random Projections
This addresses communication bottlenecks in distributed machine learning for scenarios with feature-distributed data, offering a practical solution with demonstrated improvements.
The paper tackles the problem of communication-efficient distributed statistical estimation by proposing DUAL-LOCO, which uses random projections in a single communication round to approximate feature dependencies, resulting in bounded error and better speedups while maintaining accuracy compared to a state-of-the-art method on real-world datasets.
We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy.